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Foreword 


In  this  research  effort,  the  investigators  sought  a  unified,  systematic,  and  formal 
approach  to  modeling  and  control  of  multi-agent  systems,  drawing  inspiration 
from  biological  systems  exhibiting  swarm  behavior.  Previously  developed  works 
in  the  application  of  graph  theory  were  furthered  and  applied  to  a  variety  of  dif¬ 
ferent  task-oriented  multi-agent  systems.  New  distributed  on-board  controllers 
were  developed  and  implemented  at  the  individual  level,  yielding  collective  re¬ 
sults  in  exploration,  perimeter  detection  and  surveillance,  and  formation  con¬ 
trol  that  are  indistinguishable  from  similar  centrally  controlled  systems.  This 
distributed  control  yields  robustness  and  decreased  complexity  relative  to  cen¬ 
tralized  controllers.  Simulations  and  experiments  were  conducted  to  validate 
the  analytical  results.  The  results  of  this  research  have  been  published  in  a 
variety  of  conference  proceedings  and  journal  articals,  which  are  reproduced  in 
the  appendix.  This  document  serves  to  report  the  research  conducted  with  the 
support  of  the  Army  Research  Office  under  grant  number  W911NF-08-0106. 
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Chapter  1 

Statement  of  Problem 


Swarm  control  has  gained  considerable  attention  recently  among  the  robotics 
research  community.  The  attention  is  primarily  motivated  by  the  existence  of 
very  robust  biological  counterparts  such  as  swarming  in  ants,  flocking  of  birds, 
schooling  in  fishes,  and  even  cellular  interactions.  The  primary  feature  of  these 
systems  that  has  attracted  researchers  is  that  the  intelligence  associated  with  an 
individual  agent  (e.g.  ant  or  bird)  is  very  primitive,  and  it  utilizes  interactions 
at  local  level  to  arrive  at  very  simple  decisions.  This  behavior  at  the  local  level 
emerges  into  a  group  behavior  that  appears  to  be  very  robust  and  complex.  This 
observation  in  biological  systems  has  led  the  interested  scientists  and  engineers 
to  investigate  multiple  agent  cooperative  controls  problem  using  a  bottom-up 
approach. 

1.1  Motivation 

Several  of  the  future  military  missions  are  likely  to  depend  on  a  multitude  of  in¬ 
terconnected  autonomous  vehicles  working  cooperatively  in  a  hostile,  dynamic, 
and  uncertain  environment  and  equipped  with  sensing  and  communication  ca¬ 
pabilities.  A  few  of  the  potential  applications  of  such  multi-vehicle  systems 
include  cooperative  search  and  rescue  operation,  surveillance,  reconnaissance, 
and  perimeter  protection.  Advances  in  communication  and  sensing  technologies, 
and  in  computing  resources  have  made  it  possible  to  coordinate  the  movement 
of  several  autonomous  vehicles  working  cooperatively  to  achieve  a  specified  mis¬ 
sion.  However,  such  systems  present  considerable  technological  challenges  in 
terms  of  management  and  control  of  large  numbers  of  vehicles,  communication 
between  vehicles  and  with  a  central  supervisory  controller,  and  estimation  from 
sensory  sources.  This,  added  to  the  fact  that  most  military  applications  take 
place  in  very  harsh,  dynamic,  uncertain,  and  hostile  environments,  necessitates 
the  development  of  a  robust  and  fault  tolerant  system  that  can  operate  efficiently 
in  real  time. 
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1.2  Research  Goals 

There  are  several  issues  that  arise  while  designing  such  a  system  of  autonomously 
acting  vehicles  working  together  as  a  swarm.  Most  critical  are  the  stability,  con¬ 
vergence,  and  robustness  characteristics  of  these  systems  operating  in  dynamic 
and  hostile  environments.  The  scalability  of  the  concepts  and  algorithms  is 
important  since  a  large  number  of  autonomous  agents  constitute  the  overall 
system. 

Another  important  issue  is  cohesiveness.  The  system  should  remain  con¬ 
nected  and  algorithms  should  not  allow  fragmentation  in  groups  of  agents.  The 
algorithms  should  include  inbuilt  mechanisms  for  obstacle  avoidance  and  for 
carrying  out  split /rejoin  maneuvers.  Swarm  systems  would  use  sensors  and 
communication  devices  for  maintaining  the  interconnections  and  exchanging  in¬ 
formation.  Such  interconnections  are  liable  to  be  broken.  The  coordination 
algorithms  should  be  robust  and  tolerant  to  such  sporadic  broken  connections. 
Finally,  the  group  of  vehicles  should  be  able  to  collectively  organize  themselves 
in  an  optimal  manner  to  achieve  a  specified  objective  or  mission. 

In  view  of  the  factors  mentioned  above,  an  in-depth  research  effort  was 
conducted  towards  accomplishing  the  following  goals: 

•  Development  of  a  comprehensive  and  unified  analytical  approach  to  man¬ 
age,  control,  and  coordinate  a  large  number  of  autonomous  mobile  agents. 

•  Formulation  of  graph  theoretic  techniques  that  draw  inspiration  from  com¬ 
plex  network  theory  to  accurately  model  the  connection  uncertainties  in 
these  complicated  systems  of  interconnected  agents  in  dynamic  environ¬ 
ments  via  probabilistic  and  random  graph  theory,  and  derivation  of  novel 
results  for  various  quantities  of  significance  such  as  connectedness,  diam¬ 
eter,  and  cluster  size. 

•  Establishment  of  novel  control  methodologies  motivated  from  self-organizing 
behaviors  found  in  biological  systems  that  can  be  adapted  to  control 
and  coordinate  a  large  number  of  autonomous  agents  working  together 
to  achieve  a  defined  objective. 

•  Development  of  innovative  methods  at  the  intersection  of  systems  and 
control  theory  and  graph  theory  to  analyze  the  stability,  robustness,  con¬ 
vergence  properties,  and  cohesion  of  such  interconnected  agents. 

•  Design  of  experiments  using  a  testbed  consisting  of  several  mobile  agents 
with  embedded  processing,  sensing,  and  communication  capabilities. 

•  Extensive  verification  of  proposed  algorithms  via  computer  simulations,  as 
well  as  via  laboratory  experiments,  representing  a  variety  of  cooperative 
control  applications  such  as  formation  of  various  shapes,  cooperative  data 
gathering  and  area  coverage,  and  perimeter  protection. 

Each  of  these  objectives  has  been  investigated,  leading  to  novel  control 
methodologies  and  practical  solutions.  In  some  cases,  full  implementation  was 
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not  possible  due  to  practical  issues  relating  to  available  equipment.  In  these 
cases,  the  limitations  were  carefully  observed,  and  workarounds  proposed. 

The  next  chapter  includes  summarized  results  of  each  component  of  our 
research.  The  appendix  includes  the  collection  of  peer-reviewed  journal  papers 
and  conference  papers  in  which  our  research  is  described  in  much  greater  detail. 
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Chapter  2 

Summary  of  Results 


The  work  done  under  and  outcomes  of  this  research  have  been  reported  in  several 
journal  articles  [1-4]  and  peer-reviewed  conference  proceedings  [5-20]. 


2.1  Self-sorting  of  Heterogeneous  Swarm  Popu¬ 
lations 

Sorting  and  pattern  formation  of  heterogeneous  units  are  self-organized  behav¬ 
iors  seen  in  many  biological  systems.  One  of  the  best  examples  of  such  systems 
is  a  system  of  biological  cells  of  heterogeneous  types  that  has  the  ability  to  self- 
organize  into  specific  formations,  form  different  types  of  organs  and,  ultimately, 
develop  into  a  living  organism. 

Earlier  research  in  this  area  has  indicated  that  such  pattern  formation  and 
self-sorting  behaviors  in  biological  cells  and  tissues  are  made  possible  because  of 
difference  in  the  adhesivity  between  different  types  of  cells  or  tissues.  Inspired 
by  this  differential  adhesivity  model,  our  research  explored  a  decentralized  ap¬ 
proach  based  on  differential  artificial  potential  to  achieve  the  self-sorting  and 
pattern  formation  behaviors  by  a  swarm  of  heterogeneous  robotic  agents.  The 
method  is  based  on  the  proposition  that  agents  of  different  types  experience  dif¬ 
ferent  magnitude  of  potential  while  they  are  interacting  with  agents  of  different 
types. 

Our  research  examined  the  stability  analysis  of  a  population  of  agents  in  the 
Lyapunov  sense,  and  has  laid  down  an  analytical  foundation  for  the  synthesis  of 
controllers  for  aggregation  and  segregation  with  the  artificial  potential  function 
framework.  Specifically,  conditions  for  the  synthesis  of  controllers  for  aggrega¬ 
tion  and  segregation  was  analyzed  in  one  and  two  dimensional  space.  Extensive 
simulation  studies  and  numerical  analysis  suggest  that  the  proposed  methods 
for  segregation  or  aggregation  would  always  lead  a  population  of  heterogeneous 
agents  closer  to  the  sorted  (i.e.,  segregated)  or  aggregated  configuration,  re¬ 
spectively.  Details  of  this  research  are  reported  in  [5,  16]  and  reproduced  in 
Appendix  A  in  Section  A. 2.1  and  Section  A. 2. 12  respectively. 


2.2  The  Role  of  Stochasticity  for  Swarm  Cohe¬ 
sion 

This  component  of  the  research  focused  on  the  effectiveness  of  designed  random 
behavior  in  self-organization  of  swarm  of  robotic  agents.  Inspired  by  the  self¬ 
organization  observed  in  biological  cells  and  the  role  played  by  random  forces  in 
providing  robustness  in  cell  self-organization,  we  investigated  the  possibility  of 
designing  a  decentralized  controller  for  a  swarm  of  agents  in  which  the  stochas¬ 
tic  process  is  included.  Flocking  was  considered  as  a  self-organizing  behavior 
example  to  validate  our  findings.  The  controller  is  designed  in  the  framework 
of  Lyapunov  function,  and  it  is  based  on  the  artificial  potential  due  to  interac¬ 
tions  among  agents.  Our  analysis  shows  that  the  flocking  behavior  of  the  swarm 
is  improved  and  is  more  robust  when  the  stochastic  process  is  included  in  the 
agent  controller. 

Flocking  is  a  self-organized  behavior  in  which  agents,  initially  distributed 
over  the  operating  space,  group  together  and  organize  into  a  specific  formation. 
An  example  of  this  kind  of  flocking  is  the  formation  of  germinal  centers  inside 
the  lymph  node  [21].  Flocking  has  been  extensively  studied  in  multi-agent 
literature.  Reynolds  [22]  has  been  able  to  reproduce,  in  his  computer  models, 
behaviors  representing  flocking  in  birds  and  schooling  in  fish  using  simple  rules 
based  on  local  interactions  among  agents.  Drawing  inspiration  from  Reynolds 
approach,  many  researchers  have  focused  on  designing  a  decentralized  controller 
for  achieving  flocking  behavior  [23-25] .  A  control  system  based  on  the  methods 
presented  in  these  references  should  be  able  to  yield  a  single  flock  of  agents  based 
on  local  information.  However,  if  the  local  information  can  only  be  collected 
over  a  finite  range,  it  can  lead  to  the  formation  of  more  than  one  flock. 

This  research  focused  on  scenarios  in  which  local  interaction  leads  to  the  frag¬ 
mentation  of  groups  and  investigates  how  the  introduction  of  random  processes 
in  an  agent  controller  helps  to  eliminate  or  alleviate  the  problem  of  fragmenta¬ 
tion.  We  consider  flocking  in  the  Lyapunov  function  framework.  The  Lyapunov 
function  for  the  controller  design  of  a  multi-agent  system  is  based  on  the  ar¬ 
tificial  potential  function  of  agent  interactions  [26,  27].  Inspired  by  the  role 
played  by  the  random  fluctuation  in  driving  self-organizing  processes  in  cells, 
we  include  a  random  process  in  the  decentralized  controller  for  an  agent  swarm. 
Consequently,  we  analyze  the  swarm  selforganization  using  the  stochastic  Lya¬ 
punov  function.  We  show  that  our  proposed  decentralized  controller  provides 
the  robust  flocking  of  the  swarm  of  agents. 

We  investigated  the  possibility  of  designing  a  decentralized  controller  for  a 
swarm  of  agents  in  which  a  stochastic  process  is  included.  We  considered  the 
flocking  behavior  of  a  swarm  and  described  the  previously  considered  deter¬ 
ministic  Lyapunov  function  controller  design  based  on  the  artificial  potential  of 
interactions  among  agents.  The  deterministic  design  may  not  lead  to  desired 
swarm  behaviors  because  of  the  existence  of  agent  configurations  in  which  the 
total  potential  of  swarm  interactions  has  local  minima.  Comparisons  of  the 
results  using  determinstic  and  stochastic  controllers  are  shown  in  Figure  2.1, 
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Figure  2.1:  Agent  clustering  under  deterministic  controller. 


Figure  2.2,  and  Figure  2.3. 

The  design  we  considered  is  an  extension  of  the  deterministic  one.  We  used 
the  same  Lyapunov  function.  However,  due  to  the  introduced  stochastic  process, 
the  Lyapunov  function  is  also  a  stochastic  process.  Consequently,  this  controller 
provides  a  means  of  escaping  from  the  local  minima  of  the  total  potential  of 
the  swarm  interactions.  The  intensity  of  the  included  stochastic  process  is  of 
great  significance.  If  its  intensity  is  small,  then  the  swarm  behavior  is  the 
same  as  if  controlled  by  the  deterministic  controller.  If  the  intensity  is  high, 
then  the  flocking  behavior  cannot  be  established  because  the  swarm  can  even 
escape  from  the  region  around  the  global  minimum.  Only  if  the  intensity  of  the 
stochastic  process  lies  within  some  intermediate  range,  does  the  swarm  escape 
the  local  minima  and  randomly  explores  configurations,  which  can  ultimately 
lead  the  swarm  towards  configurations  close  to  the  global  minimum  of  the  total 
potential  of  interactions.  The  drawback  of  this  design  is  that  even  in  the  global 
minimum  configuration,  the  stochastic  process  forces  the  swarm  to  search  for 
a  better  configuration.  Tuning  the  intensity  of  the  stochastic  process  of  the 
controller  is  identical  to  consideration  that  appears  in  the  design  of  the  realistic 
model  of  cell  behavior,  or  to  the  problem  of  providing  physiological  conditions 
for  self-organization  in  biological  cells. 

The  performance  of  the  proposed  controller  was  illustrated  by  the  deriva¬ 
tion  of  the  emergent  swarm  behavior  and  an  extensive  simulation  study.The 
performance  of  the  controller  with  the  random  process  term  was  compared  to 
the  performance  of  the  deterministic  controller.  The  simulation  results  suggest 
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Figure  2.2:  Agent  clustering  under  stochastic  controller. 


Figure  2.3:  Cluster  frequency  with 


Number  of  clusters 

without  stochastic  component. 
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that  the  inclusion  of  the  random  process  in  the  controller  can  certainly  improve 
the  performance  of  the  robotic  swarm  in  achieving  the  flocking  behavior.  Ad¬ 
ditional  details  of  this  research  can  be  found  in  [6],  reproduced  in  Appendix  A 
in  Section  A. 2. 2. 


2.3  Kalman  Smoother  Based  Force  Localization 
and  Mapping 

Motility  is  an  important  property  of  immune  system  cells.  It  provides  cells 
with  the  ability  to  perform  their  function  not  only  at  the  right  time  but  also 
at  the  right  place.  In  this  research,  we  introduce  the  problem  of  modeling  and 
estimating  an  effective  force  field  directing  cell  movement  by  the  analysis  of 
intravital  video  microscopy.  A  computational  approach  is  proposed  for  solving 
this  problem  without  dealing  with  a  parametrized  spatial  model  of  the  field  in 
order  to  avoid  potential  errors  due  to  inaccurate  spatial  model  assumptions. 

We  consider  the  dynamics  of  cells  similar  to  the  dynamics  of  distributed 
agents  typically  used  in  the  field  of  swarm  robotics.  The  method  utilizes  a 
fixed-interval  Kalman  filter  based  smoother.  Its  application  results  in  a  map 
giving  the  intensity  and  direction  of  the  effective  force  field.  The  results  show 
that  real-time  video  images  are  a  source  of  data,  enabling  us  to  visualize  in¬ 
triguing  spatiotemporal  phenomena  inside  immune  system  organs.  The  pro¬ 
posed  approach  can  fill  the  existing  gap  between  contemporary  technology  and 
quantitative  data  analyses  present  in  the  field  of  biosystems. 

We  introduce  the  problem  of  estimating  the  effective  force  field  influencing 
cell  motility  based  on  intravital  video  microscopy.  We  name  this  problem  force 
localization  and  mapping,  or  FLAM.  Considering  an  individual  cell  as  an  agent, 
we  find  this  problem  similar  to  a  well-studied  problem  in  robotics,  the  so-called 
simultaneous  localization  and  mapping,  or  SLAM.  In  both  problems,  the  esti¬ 
mation  is  based  on  the  individual  agents  motility  model  and  the  outcome  is  a 
map  that  visualizes  the  spatial  structure  within  the  agents  environment. 

In  this  work,  we  have  developed  an  approach  for  the  estimation  of  the  force 
field  influencing  cell  motility.  For  this  approach,  we  used  the  optimal  smoother 
estimator  of  the  forces  and  formed  a  map  that  visualizes  the  force  field  based 
on  its  estimations.  We  did  not  impose  any  analytical  constraints  on  the  force 
field.  We  introduced  a  spatial  correlation  between  the  forces  by  integrating 
estimations  over  a  grid  of  rectangular  regions.  Consequently,  the  force  field 
is  estimated  and  visualized  by  the  average  forces  assigned  to  each  rectangular 
region.  Using  the  proposed  method,  we  are  confident  to  estimate  the  intensity 
and  direction  of  the  major  constant  component  of  the  force  field.  We  also  find 
that  the  directions  of  the  force  field  are  satisfactorily  identified. 

Our  results  show  that  real-time  video  images  are  sources  of  data,  enabling 
visualization,  which  can  be  used  for  studying  intriguing  spatiotemporal  phe¬ 
nomena  inside  immune  system  organs.  We  hope  that  our  approach  and  similar 
approaches  can  fill  the  existing  gap  between  contemporary  technology  and  quan- 
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titative  data  analyses  present  in  the  field.  Additional  details  of  this  research  can 
be  found  in  [2],  reproduced  in  Appendix  A  in  Section  A.  1.2. 


2.4  A  Sampling  Approach  to  Modeling  and  Con¬ 
trol  of  a  Large  Size  Robot  Population 

Motivated  by  the  close  relation  between  estimation  and  control  problems,  we 
explore  the  possibility  to  utilize  stochastic  sampling  for  computing  the  optimal 
control  for  a  large-size  robot  population.  We  assume  that  the  individual  robot 
state  is  composed  of  discrete  and  continuous  components,  while  the  population  is 
controlled  in  a  probability  space.  Utilizing  a  stochastic  process,  we  can  compute 
the  state  probability  density  function  evolution,  as  well  as  use  the  stochastic 
process  samples  to  evaluate  the  Hamiltonian  defining  the  optimal  control. 

In  this  research  area,  we  consider  a  problem  in  which  the  presence  of  a  large- 
size  robot  population  in  a  desired  region  of  operating  space  is  maximized.  This 
problem  is  formulated  in  a  hybrid  system  framework  [28] .  Its  solution,  based  on 
the  minimum  principle  for  partial  differential  equations,  is  presented  in  [29,  30], 
and  it  is  solved  numerically  when  the  presence  of  the  robots  is  maximized  along 
one  dimension  (ID). 

The  Hamiltonian,  which  defines  the  optimal  control,  includes  integral  terms 
that  depend  on  the  solution  of  a  system  of  partial  dirrerential  equations  (PDE). 
This  system  of  PDEs  is  in  general  difficult  to  evaluate  and  the  numerical  eval¬ 
uation  of  integrals  is  prone  to  errors.  However,  we  recognize  that  the  problem 
solution  can  be  simplified  and  propose  to  use  samples  of  the  stochastic  processes 
to  evaluate  the  Hamiltonian  components  from  the  expected  values  of  the  adjust 
state  distribution. 

The  benefit  of  using  a  solution  based  on  sampling,  i.e.,  computational  statis¬ 
tical  methods,  is  that  control  problems  in  robotics  could  be  solved  faster.  This 
possibility  also  depends  on  the  ability  to  implement  sampling  and  computations 
with  samples  into  the  processor  computing  the  control. 

We  considered  a  large-size  robot  population  control  problem  that  had  been 
previously  formulated  and  solved  in  a  probability  space  utilizing  systems  of 
PDEs.  Solving  these  PDEs  is  computationally  expensive;  therefore,  having  in 
mind  that  the  PDEs  are  in  close  connection  with  the  stochastic  process  to  be 
controlled,  we  explore  an  opportunity  to  utilize  the  stochastic  process  samples 
to  compute  the  control. 

This  work  resulted  in  an  algorithm  for  generating  the  stochastic  process  that 
can  be  used  to  propagate  the  state  PDF  of  the  robot  population.  We  show  that 
the  algorithm  predicts  exactly  the  state  PDF  evolution  and  we  derive  expression 
for  the  Hamiltonian  evaluation  which  involves  the  stochastic  process  samples. 
The  Hamiltonian  evaluated  in  this  way  can  be  used  in  iterations  computing  the 
optimal  control  as  if  it  was  computed  based  on  the  PDE  system  solutions.  We 
also  notice  that  the  cost  function  resulting  from  utilizing  stochastic  processes  has 
smaller  values  than  the  cost  function  computed  based  on  PDEs.  This  means  that 
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the  evaluations  involving  the  stochastic  process  samples  are  closer  to  the  true 
values.  Additionally,  the  time  required  to  compute  the  control  for  a  very  large 
population  with  this  stochastic  method  was  reduced  significantly:  the  average 
time  required  was  a  factor  of  4.5  less  than  that  required  by  the  numerical  PDE 
solver. 

In  summary,  we  can  conclude  that  utilizing  stochastic  processes  for  comput¬ 
ing  control  of  multi-robot  systems  considering  discrete,  as  well  as  continuous 
robot  states  is  possible.  By  embedding  stochastic  process  generators  into  ana¬ 
log  circuits  and  utilizing  them  in  dedicated  processors  for  computing  control, 
complex  stochastic  optimal  control  problems  can  be  solved  efficiently  and  poten¬ 
tially  exploited  for  real-time  multi-robot  systems  control.  Additional  details  of 
this  research  can  be  found  in  [14],  reproduced  in  Appendix  A  in  Section  A. 2. 10. 


2.5  Mobile  Sensor  Frame  Mapping  via  Vision 
and  Laser  Scan  Matching 

Cooperative  mobile  robots  must  have  knowledge  of  their  positions  relative  to 
the  group  in  which  they  are  operating.  Common  on-board  sensors  such  as  laser 
rangefinders  may  be  used  to  detect  and  track  other  robots  with  high  preci¬ 
sion,  though  limited  feature  recognition  and  susceptibility  to  occlusion  reduces 
the  efficacy  of  this  solution  alone.  Matching  multiple  robots  laser  scans  can 
overcome  some  of  these  issues,  but  requires  extensive  memory  usage  and  large 
communication  bandwidth.  Overhead  imaging  systems  may  also  be  utilized, 
though  sensor  nonlinearities,  field  of  view  restrictions,  and  data  latency  limit 
such  usage.  A  data  fusion  method  was  developed  to  dynamically  evaluate  mobile 
robots’  positions  by  matching  laser  scan  data  to  overhead  image  data. 

We  developed  an  architecture  for  relative  localization  of  members  of  a  robot 
collection  by  the  fusion  of  various  sensor  data  sets.  This  method  is  based  on 
a  combination  of  multiple  laser  rangefinders  making  in-plane  measurements  of 
the  environment  (which  includes  neighboring  robots)  with  one  or  more  optical 
overhead  images  taken  by  a  supervisory  observer. 

The  ultimate  goal  of  this  matching  is  to  provide  a  method  of  unifying  multi¬ 
ple  sensor  frames  into  a  single  relative  frame,  including  correction  for  rotation, 
translation,  and  scaling.  As  of  this  writing,  the  problem  of  skewing  or  stretching 
has  not  yet  been  addressed. 

Figure  2.4  shows  an  on-board  laser  data  superimposed  onto  sketchup  of  lab 
layout,  including  obstacles  and  other  robots.  The  grey  circle  indicates  a  typical 
field  of  view  for  a  camera  field-of-view  on  one  of  the  aerial  vehicles. 

Preliminary  results  are  promising,  though  the  parameter  space  has  yet  to  be 
explored  for  robustness,  processing  efficiency  and  the  scalability  of  this  problem. 
If  this  method  is  to  be  used  with  a  collective  or  swarm  robotic  system,  the 
processing  requirements  must  be  fixed  (or  nearly  so)  with  increasing  numbers 
of  robots,  requiring  that  each  individual  robot  must  do  as  much  processing  as 
possible.  Continuing  work  focuses  on  more  efficient  methods,  such  as  the  search 


14 


algorithms  in  [31]  and  the  use  of  graph  topology  for  image  processing  [32]. 

Additional  details  of  this  research  can  be  found  in  [11],  reproduced  in  Ap¬ 
pendix  A  in  Section  A. 2. 7. 


2.6  Unstructured  Formation  Control  in  Detec¬ 
tion  of  Hazardous  Perimeters 

The  problem  of  perimeter  detection  and  monitoring  has  a  variety  of  applications. 
In  this  aspect  of  our  research,  inspired  by  flocking  agents,  a  hybrid  control  law 
with  finite  states  was  explored  for  a  group  of  autonomous  robotic  agents  whose 
purpose  was  to  search  for,  detect,  and  track  a  hazardous  spill  in  an  unknown 
environment. 

In  the  system,  each  robotic  agent  was  allowed  to  be  in  one  of  three  states: 
searching,  pursuing,  and  tracking.  The  agents  were  prioritized  based  on  their 
states,  and  a  potential  field  was  constructed  for  agents  in  each  state.  For  an 
agent  in  the  tracking  state,  the  agents  location  and  velocity  as  well  as  those  of 
its  closest  leading  and  trailing  agents  were  utilized  to  control  its  movement. 

Each  agent  was  assumed  to  have  limited  and  specific  sensing  and  communi¬ 
cation  ranges.  It  was  shown  that  the  group  can  successfully  detect  the  spill  and 
track  its  boundary  when  the  spills  location  and  size  are  changing.  The  conver- 
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Figure  2.5:  Experimental  results  for  perimeter  tracking  with  five  robots. 
(a)-(g)  Snapshots  of  selected  time  points  indicating  the  location  of  the  spill 

and  the  agents. 

(h)  Linear  speeds  of  all  agents. 


gence  of  the  tracking  algorithm  was  analyzed  for  multiple  spills  under  certain 
conditions.  Simulation  results  showed  excellent  performance  in  tracking  multi¬ 
ple  spills  that  come  together  or  split  during  the  process.  The  control  algorithm 
proposed  was  then  modified  to  fit  the  requirements  of  real  experiments,  and  the 
performance  was  verified  with  a  group  of  robotic  agents.  Both  simulation  and 
experiment  results  show  that  with  the  proposed  method,  the  agents  do  success¬ 
fully  detect  and  track  spills  of  various  shapes,  sizes,  and  movements.  Figure  2.5 
shows  a  typical  experimental  result  for  five  robots  with  a  large  spill  that  grows, 
splits  into  two  spills,  and  then  collapses  to  one  large  spill. 

A  hybrid  system  of  finite  states  with  feedback  control  was  developed  for  a 
group  of  agents  with  a  limited  sensor  FOV  and  a  limited  communication  range 
to  search  for,  detect,  and  track  a  number  of  spills  in  an  unknown  environment. 
A  hierarchical  potential  field  was  designed  for  agents  in  different  states  to  con¬ 
trol  their  movement  that  provides  a  simple  control  law  for  collision  avoidance. 
Simulation  and  experiment  results  demonstrated  that  the  agents  do  successfully 
detect  and  track  spills  whose  location,  shape,  size,  and  number  may  change  over 
time.  Significantly,  this  adaptation  occurs  automatically  with  no  need  to  spec¬ 
ify  a  predetermined  distance  between  adjacent  agents.  Further  focus  on  search 
efficiency  and  optimal  convergence  rates  is  ongoing.  Additional  enhancement  of 
the  robots  sensing  ability  continues. 

Additional  details  of  this  research  can  be  found  in  [3,  12,  13],  reproduced  in 
Appendix  A  in  Section  A. 1.3,  Section  A. 2. 9,  and  Section  A. 2. 8  respectively. 
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2.7  Sensor  Choice  Optimization  and  Control  of 
Heterogeneous  Agents 

The  goal  of  this  novel  research  effort  was  to  address  the  impact  that  variations 
in  a  pre-selectable  composition  of  a  multi-agent  system  (MAS)  can  have  on 
its  resulting  capabilities  and  performance  while  utilizing  a  composition-tunable 
potential-function  control  scheme.  This  was  accomplished  by  first  defining  the 
litany  of  capabilities  and  performance  characteristics  of  individual  agents,  and 
the  relationships  between  these  (intra-agent). 

Next,  various  incarnations  of  the  MAS  were  hypothesized  where  distinctions 
existed  through  variables  such  as  agent  types,  payload  types  and  capabilities, 
relative  quantities,  motion  behaviors,  and  most  notably,  potential  function  con¬ 
trol  tuning  parameters.  These  were  extensively  represented  and  tested  both  in 
simulation  and  in  hardware  experimentation  resulting  in  rich  data  sets. 

Then,  trade-off  analyses  were  conducted  to  identify  those  factors  demon¬ 
strating  significant  importance.  These  analyses  provided  insight  to  the  control 
system  and  hardware  design  such  that  the  core  elements  of  the  MAS  itself  could 
be  orchestrated  to  provide  the  best  efficiencies  against  comprehensive  mission 
accomplishment . 

Finally,  the  analyses  provided  insight  to  mission  planners  such  that  they 
would  be  able  to  tailor  the  composition  of  a  task-specific  MAS  in  terms  of  cost 
and  performance. 

A  novel  implementation  of  potential  function  control  is  developed  that  uti¬ 
lizes  tuning  parameters  generated  from  state  space  relativity.  For  example,  a 
tuning  parameter  consisting  of  the  relative  charge  ratios  of  the  individual  agents 
and  state  space  occupancy  map  cells  is  manipulated  to  optimize  agent  disper¬ 
sion,  path  planning  efficiency,  and  mission  completion  times.  The  results  of  this 
research  show  how  the  census  of  agents,  including  quantity  and  composition, 
ultimately  impacts  the  overall  performance  of  the  MAS.  For  example,  increases 
in  quantity  result  in  a  non-linearly  fading  increase  in  performance  measured  by 
time  to  complete  a  pre-defined  mission.  This  non-linearity  means  that  more  is 
not  always  better,  especially  in  consideration  of  logistic  burdens  that  increase 
with  additional  quantities.  Figure  2.6  shows  a  characteristic  final  shared  occu¬ 
pancy  map  from  Webots  [33]  simulation. 

The  potential-function  control  architecture  proved  particularly  robust  and 
capable  when  used  in  conjunction  with  state  space  representation.  Manipula¬ 
tion  of  the  relative  charges  associated  with  MAS  components  and  occupancy 
mapping  resulted  in  a  25  percent  improvement  in  task  completion  criteria.  The 
dramatic  improvement  in  time-to-completion  versus  the  number  of  agents  uti¬ 
lized  is  shown  in  Figure  2.7. 

Finally,  the  concept  of  ’’sensing  opportunity”  is  postulated  where  the  com¬ 
bined  capability  of  a  MAS  as  a  function  of  its  agents’  sensing  systems  is  very 
much  impacted  by  each  sensor’s  opportunity  potential  to  collect  information 
of  value  to  the  desired  MAS  behavior,  and  not  necessarily  the  sum  total  raw 
capability  of  the  combined  sensor  payloads. 
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Figure  2.6:  Shared  map  of  exploration  region  (Webots  simulation) 


Figure  2.7:  Time  to  complete  exploration  mission  versus  number  of  agents. 
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Full  details  of  and  additional  aspects  to  this  research  can  be  found  in  [34] ,  and 
partially  published  in  [4,  15]  and  reproduced  in  Appendix  A  in  Section  A. 2. 11 
and  Section  A.  1.4  respectively. 


2.8  Rendezvous  in  an  Unbounded  Domain  with¬ 
out  a  Shared  Coordinate  System 

The  aggregation  and  rendezvous  problems  are  basic  hurdles  in  swarm  robotics 
and  distributed  control.  In  aggregation,  the  goal  is  to  develop  a  distributed 
control  that  allows  an  arbitrary  number  of  agents  to  act  together  to  achieve 
a  common  goal.  For  the  rendezvous  problem,  the  goal  is  for  the  agents  to 
converge  to  a  common  position.  We  assume  that  the  agents  do  not  have  a  shared 
coordinate  system  and  operate  in  an  unbounded  two-dimensional  domain.  Also, 
we  assume  communication  constraints  between  the  agents  that  are  realistic  for 
current  technology.  However,  the  control  developed  here  is  high  level  and  rather 
abstract;  we  are  not  concerned  with  issues  like  obstacle  avoidance,  and  the 
agents  are  assumed  to  be  point  masses  operating  in  discrete  time. 

Without  shared  coordinates,  robots  must  first  sense  the  position  of  neigh¬ 
bors  to  develop  a  shared  map.  In  this  research,  we  explored  a  two  stage  control 
protocol,  first  utilizing  random  motion  that  allows  the  agents  to  develop  shared 
coordinates.  The  agents  then  used  local  averaging  to  solve  the  rendezvous  prob¬ 
lem  under  some  realistic  constraints. 

Communication  restrictions  between  agents  depend  on  mission  specific  pa¬ 
rameters:  the  sensors  of  the  robots,  the  size  of  the  domain,  etc.  We  make  the 
following  assumptions  on  the  communication  abilities  of  the  agents  and  their 
ability  to  infer  others’  positions. 

•  No  shared  map:  Agents  must  meet  or  sense  each  other  in  order  share 
coordinates. 

•  Sensing  Radius:  Each  robot  is  equipped  with  a  sensor,  such  as  a  camera 
or  laser  range  finder.  Most  commonly,  such  sensors  have  a  sweep  of  less 
than  360  degrees,  and  exhibit  sensing  errors.  We  assume  that  each  agent 
can  perfectly  detect  all  other  agents  within  some  radius  Sr. 

•  Simultaneous  Communication:  Each  agent  can  maintain  at  most  k 
simultaneous  communications.  For  example,  standard  Bluetooth  technol¬ 
ogy  allows  k  =  7  simultaneous  communications.  It’s  important  to  note 
that  it  is  not  necessary  to  keep  the  same  k  partners  at  each  moment. 

•  Probabilistic  Communication  Radius:  Robots  have  a  limited  energy 
source  and  may  be  examining  an  area  much  larger  than  the  power  of  the 
antennae  can  cover.  In  general,  the  probability  of  a  successful  communi¬ 
cation  decreases  with  distance.  We  simplify  matters  by  assuming  a  fixed 
probability  pd  of  the  failure  of  each  communication  attempt. 
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We  developed  a  control  where  agents  first  move  randomly  to  create  a  cache  of 
comrades  with  common  coordinates.  Then  each  agent  picks  some  small  number 
of  individuals  from  its  cache  and  moves  to  the  average  position  of  those  individ¬ 
uals.  We  then  developed  a  random  motion  algorithm  that  supplies  the  agents 
with  common  coordinates  and  provides  bounds  on  the  time  through  simulations. 
We  discuss  the  global  dynamics  of  the  averaging  portion  of  the  control. 

The  challenge  is  to  rendezvous  a  relatively  small  number  of  agents  in  a 
perhaps  vast  environment.  We  have  given  a  two  stage  process  where  agents 
first  find  a  shared  coordinate  systems  and  are  able  to  communicate  with  each 
other.  Then,  each  agent  moves  directly  between  two  other  agents  in  order  to 
rendezvous  at  a  common  point.  We  choose  only  two  others  in  order  to  limit 
dropped  communications  due  to  technology  constraints. 

We  have  considered  distributed  controls  that  are  functions  only  of  an  agent’s 
position.  Real  robots  move  in  continuous  time  and  continuous  space,  and  we 
look  to  build  simulations  that  utilize  the  same  controls  but  keep  track  of  each 
robot’s  voyage  to  distant  positions.  For  example,  a  robot  may  choose  a  random 
point  in  its  wandering  domain  and  during  its  trip  to  that  point  encounter  another 
robot  within  its  sensing  radius.  These  encounters  were  ignored  in  this  work,  and 
so  the  results  achieved  form  a  worst  case  scenario  for  the  number  of  positions 
to  explore  (i.e.,  an  upper  bound)  to  achieve  aggregation. 

More  specifically,  suppose  each  agent  has  a  timer  based  on  its  maximum 
speed  and  the  diameter  of  its  wandering  domain.  Once  it  picks  a  random  point 
to  move  to,  it  sets  its  speed  so  that  it  arrives  at  the  point  as  the  timer  is 
expiring.  It  then  performs  a  sensing  sweep  at  that  point  (based  on  another 
timer)  and  repeats  the  process.  In  this  way  the  agents  maintain  a  great  deal  of 
synchronization. 

Additional  details  of  this  and  related  research  can  be  found  in  [17,  18], 
reproduced  in  Appendix  A  in  Section  A. 2. 13  and  Section  A. 2. 14  respectively. 


2.9  Unstructured  Formation  Control  of  Hetero¬ 
geneous  Agents  for  Exploration 

In  this  research  thrust,  we  developed  a  fully  distributed,  scalable  method  of  con¬ 
trolling  agents  with  nonholonomic  constraints  using  a  Morse  potential  function. 
This  method  successfully  controls  a  swarm  of  differential-drive  (unicycle- type) 
agents  to  stable  and  predictable  formations  whose  structures  are  not  defined  a 
priori.  The  system  achieves  a  stable,  minimal  energy  state. 

We  consider  the  effect  of  numerosity  constraints,  as  observed  in  birds  and  fish 
in  their  shoaling  and  flocking  behavior  as  a  mechanism  of  reducing  complexity, 
in  the  interest  of  achieving  fully  distributed  control  over  a  swarm  of  any  size. 
The  application  of  numerosity  constraints  to  a  swarm  system  allows  the  swarm 
to  grow  without  bound  and  with  no  increase  in  required  processing  capability 
of  the  individual  agents.  We  explore  this  parameter  as  a  method  of  minimizing 
processing  and  storage  requirements  while  still  achieving  the  qualitative  swarm 
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performance.  Results  from  simulations  are  given  for  swarms  ranging  in  size  over 
N={6,. . .  ,100}  acting  under  our  proposed  controller  as  applied  to  differential- 
drive  (unicycle-type)  robots. 

In  many  applications  that  employ  multiple  autonomous  agents,  the  ability 
of  the  agents  to  move  in  formation  is  highly  desirable,  even  if  the  formation 
structure  is  not  explicitly  given.  This  is  certainly  true  for  environmental  ex¬ 
ploration  and  coverage  applications  (see,  e.g.,  [35-38]),  or  escort  applications 
[20,  39].  Stable  formation  control  can  reduce  the  occurrence  of  agents  getting 
“lost”  or  separated  from  the  group.  In  this  research,  we  specifically  examine 
the  effect  of  “numerosity  constraints”  on  the  dynamic  formation  behavior  of  a 
group  of  differential-drive  ground  robots  under  fully  distributed  control. 

The  concept  of  a  “numerosity  constraint”  [40,  41]  is  important  to  consider; 
evidence  from  natural  systems  has  shown  that  both  birds  [42]  and  fish  [43,  44] 
exhibit  such  constraints  in  their  shoaling  and  flocking  behavior.  Clearly,  in  a 
murmuration  of  starlings  each  individual  cannot  keep  track  of  the  entire  flock 
that  numbers  in  the  thousands,  nor  can  an  individual  fish  keep  track  of  the 
entirety  of  its  school.  Each  individual  is  thus  only  aware  of  a  finite  number  of 
its  nearest  neighbors,  making  its  own  motive  decisions  based  only  on  this  in¬ 
put.  This  ability  is  clearly  desirable  in  a  scalable  application  of  multiple  mobile 
robots,  allowing  each  individual  to  bound  its  input  requirements  (and  accord¬ 
ingly,  its  memory  and  processing  requirements  as  well  as  its  communication  or 
sensing  requirements)  regardless  of  the  overall  size  of  the  swarm. 

The  unconstrained  case  achieves  a  significantly  denser  structure  due  to  the 
fact  that  each  agent  is  aware  of  all  other  agents;  thus,  the  presence  of  distant 
agents  actually  decreases  the  swarm  spacing.  In  Figure  2.8a,  the  agents  are  in 
general  separated  by  3 m  to  4m,  whereas  in  Figure  2.8b  the  agents  are  separated 
by  12m  to  13?n.  In  fact,  for  the  parameters  used  in  these  simulations,  the 
minimum-energy  spacing  for  two  agents  by  themselves  is  approximately  15.5m. 
This  reveals  that  by  applying  numerosity  constraints,  the  typical  spacing  of 
agents  can  be  regulated  to  “near”  a  design  point  regardless  of  the  overall  size 
of  the  swarm. 

It  should  be  noted  that  in  these  figures,  the  absolute  positions  and  absolute 
orientations  are  not  critical;  this  is  the  very  nature  of  seeking  such  semi-stable 
configurations.  Only  the  relative  positions  and  orientations  are  of  importance. 

We  successfully  developed  a  controller  that  yields  stable  formations  for  agents 
with  nonholonomic  constraints.  We  additionally  allow  a  constraint  to  be  placed 
on  the  agents  restricting  their  awareness  of  the  bulk  of  the  swarm  in  order  to  al¬ 
low  scalability,  and  we  have  shown  that  this  constraint  numerically  reduces  the 
ability  of  the  swarm  to  achieve  its  minimum  possible  energy;  even  under  this 
constraint,  however,  the  swarm  is  still  able  to  significantly  reduce  its  energy 
relative  to  the  starting  energy,  and  successfully  achieves  formation  control  in 
a  predictable  way,  maintaining  “design  parameters”  for  ideal  spacing  of  agents 
regardless  of  swarm  size. 

Additional  details  of  this  and  related  research  can  be  found  in  [19,  20], 
reproduced  in  Appendix  A  in  Section  A. 2. 15  and  Section  A. 2. 16  respectively. 
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Final  position  and  orientation  without  numerosity  constraints 
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(a)  Steady-state  formation  with  no 
numerosity  constraint 


(b)  Steady-state  formation  with 
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Figure  2.8:  Steady-state  formations  of  a  swarm  of  100  agents. 


2.10  Conclusion 

This  research  project  attempted  to  develop  a  unified,  systematic,  and  formal  ap¬ 
proach  to  address  the  previously  identified  issues  in  multi-agent  systems.  The 
project  proposed  to  draw  inspiration  from  biological  systems  exhibiting  swarm 
behavior  suitable  for  control  of  such  interconnected  agents.  The  basic  research 
aimed  to  develop  a  task-oriented  decentralized  controller  housed  in  each  agent 
which  derived  its  control  action  from  the  interactions  among  neighboring  agents, 
environmental  cues,  and  mission-at-lrand.  The  research  also  aimed  to  develop 
a  novel  formalism  to  represent  such  complex  and  uncertain  networks  of  large 
number  of  agents  with  the  help  of  probabilistic  graphs,  and  develop  methodolo¬ 
gies  to  analyze  systems  stability,  robustness,  and  cohesiveness  in  the  presence 
of  connection  uncertainties  and  estimation  errors. 

With  the  support  of  the  Army  Research  Office,  we  have  successfully  ex¬ 
panded  the  field  of  distributed  robotic  control  under  various  uncertainties,  ex¬ 
ploiting  recent  results  in  graph  theory  when  applicable.  Each  of  the  specific 
topics  set  forth  in  our  goals  has  been  investigated,  and  all  of  them  have  been 
achieved  to  the  extent  possible.  Our  research  has  touched  on  a  myriad  of  ap¬ 
plications  utilizing  distributed  control  and  estimation,  confirming  our  intuition 
that  this  area  of  research  is  vital  and  meaningful. 

We  thank  the  Army  Research  Office  for  its  support  of  our  research.  Without 
such  support,  this  would  not  have  been  possible.  In  the  course  of  this  research 
work,  this  funding  provided  direct  support  for  two  graduate  students,  and  pro¬ 
vided  indirect  support  (i.e. ,  fostered  collaborative  research)  for  another  graduate 
student,  two  postdoctoral  scholars,  and  at  least  six  undergraduate  students. 
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This  appendix  includes  a  compilation  of  the  manuscripts  written  and  published 
in  peer-reviewed  journals  and  peer-reviewed  conference  proceedings. 

A.l  Journal  Articles 

A. 1.1  Segregation  of  Heterogeneous  Units  in  a  Swarm  of 

Robotic  Agents 

The  following  paper  (on  the  next  6  pages)  was  co-authored  by  Manish  Kumar, 
Devendra  P.  Garg,  and  Vijay  Kumar,  and  appeared  in  March  2010  in  IEEE 
Transactions  on  Automatic  Control,  Volume  55,  pages  743-748. 
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Fig.  8.  Thrust  force  profile  during  landing  on  a  sailing  aircraft  carrier. 


force  is  bounded,  the  resulting  control  scheme  ensures  global  asymp¬ 
totic  stability.  If  however  both,  the  thrust  force  and  the  rolling  moment 
are  bounded,  the  proposed  controllers  ensure  semiglobal  asymptotic 
stability. 


References 

[1]  M.  Fliess,  J.  Levine,  P.  Martin,  and  P.  Rouchon,  “Flatness  and  defect  of 
nonlinear  systems:  Introductory  theory  and  examples,”  Int.  J.  Control, 
vol.  61,  no.  6,  pp.  1327-1361,  1995. 

[2]  J.  Hauser,  S.  Sastry,  and  G.  Meyer,  “Nonlinear  control  design  for 
slightly  non-minimum  phase  systems:  Application  to  V/STOL  air¬ 
craft,”  Automatica,  vol.  28,  no.  4,  pp.  665-679,  1992. 

[3]  P.  Martin,  S.  Devasia,  and  B.  Paden,  “A  different  look  at  output 
tracking:  Control  of  a  VTOL  aircraft,”  Automatica,  vol.  32,  no.  1,  pp. 
101-107,  1995. 

[4]  R.  Olfati-Saber,  “Global  configuration  stabilization  for  the  VTOL  air¬ 
craft  with  strong  input  coupling,”  IEEE  Trans.  Autom.  Control,  vol.  47, 
no.  11,  pp.  1949-1952,  Nov.  2002. 

[5]  A.  R.  Teel,  “Global  stabilization  and  restricted  tracking  for  multiple 
integrators  with  bounded  controls,”  Syst.  Control  Lett.,  vol.  18,  no.  3, 
pp.  165-171,  1992. 

[6]  H.  Ye,  H.  Wang,  and  H.  Wang,  “Stabilization  of  a  PVTOL  aircraft  and 
an  inertia  wheel  pendulum  using  saturation  technique,”  IEEE  Trans. 
Control  Syst.  Technol. ,  vol.  15,  no.  6,  pp.  1143-1150,  Nov.  2007. 

[7]  A.  Zavala-Rio,  I.  Fantoni,  and  R.  Lozano,  “Global  stabilization  of  a 
PVTOL  aircraft  model  with  bounded  inputs,”  Int.  J.  Control,  vol.  76, 
no.  18,  pp.  1833-1844,  2003. 

[8]  A.  Hably,  F.  Kendoul,  N.  Marchand,  and  P.  Castillo,  “Further  results 
on  global  stabilization  of  the  PVTOL  aircraft,”  in  Positive  Systems, 
C.  Commault  and  N.  March,  Eds.  Berlin,  Germany:  Springer- Verlag, 
2006,  pp.  303-310. 


743 

Segregation  of  Heterogeneous  Units 
in  a  Swarm  of  Robotic  Agents 

Manish  Kumar,  Devendra  P.  Garg,  and  Vijay  Kumar 


Abstract — There  are  several  examples  in  natural  systems  that  exhibit  the 
self-organizing  behavior  of  segregation  when  different  types  of  units  in¬ 
teract  with  each  other.  One  of  the  best  examples  is  a  system  of  biological 
cells  of  heterogeneous  types  that  has  the  ability  to  self-organize  into  spe¬ 
cific  formations,  form  different  types  of  organs  and,  ultimately,  develop 
into  a  living  organism.  Previous  research  in  this  area  has  indicated  that 
such  segregations  in  biological  cells  and  tissues  are  made  possible  because 
of  the  differences  in  adhesivity  between  various  types  of  cells  or  tissues.  In¬ 
spired  by  this  differential  adhesivity  model,  this  technical  note  presents  a 
decentralized  approach  utilizing  differential  artificial  potential  to  achieve 
the  segregation  behavior  in  a  swarm  of  heterogeneous  robotic  agents.  The 
method  is  based  on  the  proposition  that  agents  experience  different  magni¬ 
tudes  of  potential  while  interacting  with  agents  of  different  types.  Stability 
analysis  of  the  system  with  the  proposed  approach  in  the  Lyapunov  sense 
is  carried  out  in  this  technical  note.  Extensive  simulations  and  analytical 
investigations  suggest  that  the  proposed  method  would  lead  a  population 
of  two  types  of  agents  to  a  segregated  configuration. 

Index  Terms — Differential  potential,  heterogeneous  swarm,  segregation. 


I.  Introduction 

Control  of  multiple  vehicles  carrying  out  a  cooperative  task  has  re¬ 
ceived  the  attention  of  several  researchers  working  in  the  area  of  mo¬ 
bile  robotics  because  of  its  potential  applications  in  a  number  of  fields 
including  large-scale  sensing  operations,  cooperated  search  and  rescue 
task,  surveillance,  reconnaissance,  and  boundary  protection.  One  of  the 
very  first  applications  of  cooperative  control  of  multiple  agents  was  be¬ 
havioral  simulation  of  flocks  of  birds,  herd  of  animals  and  schools  of 
fish  for  computer  graphics  presented  by  Reynolds  [1 1],  He  noted  three 
simple  behaviors  that  led  to  flocking  in  birds  and  fish:  collision  avoid¬ 
ance,  velocity  matching,  and  flock  centering  (in  decreasing  order  of 
precedence).  Based  on  inspirations  drawn  from  Reynolds’  approach, 
many  researchers  have  focused  on  designing  a  decentralized  controller 
for  achieving  flocking  behavior.  The  examples  of  these  decentralized 
control  mechanisms  include  techniques  based  on  behavioral  control 
[2],  leader-follower  [5],  and  artificial  potential  function  [9],  [10]. 

Obtaining  a  desired  shape  and  pattern  of  the  formation  can  be  crit¬ 
ical  for  a  specific  mission  relying  on  coordinated  action  by  multiple 
mobile  agents.  For  example,  if  a  large  number  of  robots  need  to  be 
deployed  to  perform  complicated  tasks  such  as  surveillance  of  a  large 
area,  perimeter  protection  of  a  vital  installation,  or  surrounding  site  of 
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a  chemical  or  hazardous  waste  spill,  the  robots  must  be  able  to  au¬ 
tonomously  organize  themselves  in  certain  formation,  pattern,  or  shape. 
In  many  situations,  it  may  not  be  possible  to  integrate  all  of  the  capa¬ 
bilities,  sensing  or  actuation,  required  for  different  kinds  of  tasks  in  an 
individual  robot.  Accordingly,  the  robots  may  have  heterogeneous  abil¬ 
ities  for  sensing  and  actuation  that  will  enable  them  to  perform  specific 
tasks.  Heterogeneous  robots  must  be  able  to  self-organize  themselves 
in  a  mission  specific  manner  to  carry  out  the  tasks  assigned  to  them  co¬ 
operatively.  This  technical  note  focuses  on  one  particular  behavior  of 
segregation  in  a  swarm  of  heterogeneous  agents. 

Segregation  is  a  phenomenon  which  is  seen  in  several  biological 
systems.  For  example,  ants  sort  brood  [4]  in  concentric  annuli  with  the 
smallest  members  (need  least  tending)  in  the  middle  and  the  largest 
members  (need  most  tending)  on  the  periphery.  Several  species  of 
cockroaches  aggregate  or  segregate  [1]  based  on  odors  of  strains. 
Similarly,  some  anuran  species  of  amphibian  larvae  [8]  have  shown  to 
recognize  and  associate  with  their  siblings  or  half-siblings.  Segrega¬ 
tion  of  cells  based  on  their  types  and  functionalities  is  one  of  the  best 
examples  of  sorting  in  biological  systems.  Cell  segregation  is  one  of 
the  basic  phenomenon  which  leads  to  formation  of  patterns  and  organs 
in  living  organisms.  Complex  patterns  form  in  living  systems  as  a 
result  of  interactions  among  basic  constituent  units.  Examples  include 
the  formation  of  ocular  dominance  stripes  in  the  visual  cortex  of  cats 
and  monkeys,  and  the  formation  of  the  tentacle  pattern  in  hydra. 

Study  of  the  mechanisms  via  which  these  patterns  form  can  provide 
valuable  insights  for  distributed  problem  solving  strategies.  Most  of  the 
models  in  literature  that  explain  the  formation  of  patterns  rely  on  the 
principles  of  differential  attraction/inhibition.  For  example,  Swindale's 
model  [14]  accounts  for  the  formation  of  ocular  dominance  stripes  in 
the  visual  cortex  based  on  the  local  activation  and  lateral  inhibition 
(LALI)  mechanism  [6]  for  the  same  type  of  synapses,  and  the  local  in¬ 
hibition  and  lateral  activation  (LILA)  mechanism  [3],  [14],  the  reverse, 
for  the  unlike  type  of  synapses.  Similarly,  reaction-diffusion  [17]  is  a 
model  to  mathematically  represent  the  transport  phenomenon  in  bio¬ 
logical  and  natural  systems.  This  model  tries  to  explain  the  interaction 
of  particles  with  the  environment  and  their  motion  in  space.  One  of 
the  models  to  explain  the  brood  sorting  in  ants  is  based  on  differential 
diffusion.  This  model  [12]  achieves  sorting  in  two  phases.  In  the  first 
phase,  called  clustering  phase,  broods  of  all  types  are  clustered  around 
the  same  place.  In  the  second  phase,  called  spacing  phase,  ants  move 
broods  out  in  a  random  fashion  where  the  frequency  and  the  distance 
of  movement  depends  on  the  weight  of  the  brood. 

In  early  1990s,  Graner  and  Glazier  [7]  proposed  a  lattice  based  mod¬ 
ified  version  of  large-Q  Potts  model  with  differential  adhesivity  to  ex¬ 
plain  and  simulate  the  segregation  of  a  mixture  of  two  types  of  biolog¬ 
ical  cells.  In  fact,  it  has  been  long  known  [13]  that  it  is  the  difference  in 
intercellular  adhesivity  that  leads  to  sorting  in  cells.  The  final  state  of 
cell  configuration  is  achieved  when  the  overall  surface  energy  is  glob¬ 
ally  minimized.  Based  on  this  principle,  Steinberg  [13]  postulated  that 
two  types  of  cellular  units  A  and  B  are  segregated  when: 


technical  note  is  motivated  by  this  differential  adhesivity  phenomenon 
observed  in  biological  systems. 

The  primary  contribution  of  this  technical  note  is  in  the  synthesis  and 
analysis  of  a  controller  that  allows  heterogeneous  robots  to  segregate 
so  that  they  form  separate  groups  comprising  of  homogeneous  robots. 
The  specific  problem  is  formulated  in  the  next  section.  This  is  followed 
by  the  formulation  and  the  analysis  of  the  control  law.  Finally,  an  exten¬ 
sive  simulation  study  is  presented  as  a  means  to  validate  the  approach 
presented  in  this  technical  note. 


II.  Problem  Formulation 

The  group  of  mobile  agents,  considered  in  this  technical  note,  con¬ 
sists  of  Iff  fully  actuated  agents,  each  of  whose  dynamics  is  given  by 
the  double  integrator 


qi  =  pi  and  pi  =  u  ;  i  =  1, 2, . . . ,  N  (2) 

where  q;  and  p;  are  m  -dimensional  position  and  velocity  vectors  re¬ 
spectively  of  agent  i.  The  group  of  mobile  agents  consists  of  two  dif¬ 
ferent  types  of  agents:  type  A  and  type  B.  The  number  of  agents  of 
type  A  is  Na  and  that  of  type  B  is  Nb  such  that  N  =  Na  +  Nb  ■ 
The  objective  of  this  technical  note  is  to  synthesize  a  controller  that 
can  asymptotically  flock  and  separate  the  robots  of  type  A  and  type  B 
into  two  different  groups  (referred  to  as  segregation).  Agents  are  said  to 
flock  (asymptotically)  when  all  agents  achieve  the  same  velocity  vector 
and  the  distances  between  the  agents  are  stabilized.  A  group  of  agents 
of  types  A  and  B  are  said  to  be  segregated  if  the  average  distance  be¬ 
tween  the  agents  of  the  like  types  (type  A  or  type  B)  is  less  than  the 
average  distance  between  the  agents  of  the  unlike  types  (i.e.,  between 
the  agents  of  type  A  and  type  B).  Or 


XV 

where 

Y. 


AA  ,  AB  BB  .  AB 

t<ivg  Lug;  I  avg  Lifg 


(3) 


is  the  average  distance  between  the  agents  of  types  X  and 


III.  Control  Law  Formulation 

This  section  presents  the  control  law  which  enables  a  population  of 
heterogeneous  agents  to  asymptotically  flock  as  well  as  segregate.  For 
a  system  of  N  mobile  agents  with  N a  agents  of  type  A  and  Nb  agents 
of  type  B,  the  following  feedback  control  law  is  considered: 


Wab  <  - 


(  Waa  +  Wbb  ) 


2 


(1) 


where  IF,]  4  and  Wbb  represent  the  work  of  cohesion  between  cells 
of  same  types  (i.e.,  between  types  A  &  A,  and  B  &  B  respectively),  and 
Wab  represents  the  work  of  adhesion  between  cells  of  types  A  and  B. 
The  method  for  segregation  in  artificial  mobile  agents  presented  in  this 


«i  =  -  (hi  ~  #11 )  “  Ylip‘  ~  ph  (4) 

i  i 

where  m  is  the  control  input  to  the  agent  i,  V;j  (\\qj  —  q,\\)  is  the  arti¬ 
ficial  potential  of  interaction  between  agents  i  and  j,  j  £  {A,  B}  \i, 
|| qj  —  7;  1 1  is  the  Euclidean  norm  of  the  vector  (qj  —  qi),  and  Vqi  is  the 
gradient  with  respect  to  the  coordinates  of  the  agent  i  i.e.,  q-; .  First  term 
in  (4)  represents  the  gradient  of  potential  function,  and  the  second  term 
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Interaction  Potential 


Interaction  Force 


(b) 


Fig.  1.  Interaction  potential  and  force,  (a)  Interaction  potential  versus  inter¬ 
agent  distance;  (b)  interaction  force  between  agents. 


Potentials  arising  due  to  interactions  between  types  B  and  B  (V.BB), 
and  c)  Potentials  arising  due  to  interactions  between  types  A  and  B 
(Vij  ).  Our  main  result  for  cause  of  segregation  is  based  on  the  con¬ 
cept  of  differential  potential,  i.e.,  agents  experience  different  magni¬ 
tudes  of  potential  when  they  are  interacting  with  the  agents  of  different 
types.  This  is  achieved  by  assuming  different  values  of  the  parameter 
do  for  different  types  of  interaction  potentials.  Explicitly,  in  (5) 

d  =  {  d°  A  =  d°B  if  (i,~'  e  A%>  or  J  e  (6) 

°  \d£B  if(i€  A,  j  €B)  or (i  £  G  A).  K 

The  control  law  for  segregation  can  be  achieved  when 

do  =  do  <  d0  (7) 

Fig.  1(b)  shows  the  plot  of  force  of  interaction  due  to  similar  types  and 
due  to  dissimilar  types  of  robots  versus  inter-agent  distance  when  the 
condition  (7)  for  segregation  controller  is  met.  In  this  case,  it  can  be 
seen  that  the  interaction  force  between  agents  of  same  types  is  greater 
than  that  between  agents  of  different  types  at  any  given  distance.  Hence, 
this  method  of  segregation,  based  on  differential  potential,  is  analogous 
to  Steinberg’s  [13]  explanation  of  cellular  segregation  based  on  differ¬ 
ential  adhesiveness  (see  (1)). 

IV.  Controller  Analysis 

In  this  section,  we  carry  out  an  analysis  of  convergence  and  stability 
properties  of  the  system  of  multiple  agents  obeying  dynamics  given  by 
(2)  under  control  law  given  by  (4).  In  order  to  carry  out  the  stability 
analysis  of  the  collective  motion  of  the  agents,  the  following  positive 
definite  function  can  be  chosen  as  the  Lyapunov  function: 

P)  =  V(q)  +  |pTp  (8) 

where  q  £  Brn  is  stacked  position  vector  of  all  agents,  p  £  B,n'' 
is  stacked  velocity  vector  of  all  agents,  and  V'(q)  :  B"‘  ''  — »  R>0  is 
the  total  potential  energy  of  the  system  which  consists  of  three  parts 
resulting  from  interactions  between  agents  of:  1)  type  A,  2)  types  A 
and  B,  and  3)  type  B.  This  can  be  written  as 


represents  damping  and  causes  the  agents  to  match  their  velocities  with 
each  other.  The  artificial  potential  I  ;j(||<7j  —  q;  ||)  :  Bm  — *■  B>0  is  a 
positive  function  of  relative  distance  between  a  pair  of  agents  [9]  given 
by 

Vij  (  hi  ~  Qi  II)  =  a  (in  ( llqy  -  < a  || )  +  )  (5) 

where,  a  is  a  scalar  control  gain,  and  do  is  a  parameter.  Fig.  1(a)  shows 
the  potential  function  plotted  against  the  inter-agent  distance.  As  in¬ 
dicated  in  the  figure,  the  potential  becomes  minimum  when  the  inter¬ 
agent  distance  is  do-  The  initial  condition  and  dynamics  exclude  the 
situations  where  </;  =  qj  where  the  control  law  (4)  is  undefined. 

Since  there  are  two  types  of  mobile  agents  involved  in  the  system, 
there  are  three  different  kinds  of  artificial  potentials  involved:  a)  Po¬ 
tentials  arising  due  to  interactions  between  types  A  and  A  (VAA),  b) 


V(q)  =I/AA(q)  +  Vbb  ( q)  +  Vab(  q) 

=  tE  E  *S;(lto-««-ll) 

ieAjeAj^; 

+  E  E  Vij  (ii«j  -  «;ii) 

iGAjeB 

+  ^E  E  Vii  (II®-*  ID-  (9) 

The  collective  dynamics  of  the  system  is  given  by 

q=p  (10) 

P  =  -  W(q)  -  T(q)p  (IT) 

where  L( q)  £  j^mNxmh  -dimensional  graph  Laplacian  (see 
[10]).  Graph  Laplacian  represents  the  interconnections  present  in  a 
graph  in  a  matrix  form,  and  can  be  obtained  from  other  graph  theoretic 
quantities  such  as  degree  matrix  and  adjacency  matrix.  Among  other 
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important  properties  of  graph  Laplacian  matrix  L(q),  it  is  a  positive 
semi-definite  matrix. 

Lemma  4.1:  Consider  a  system  of  N  mobile  agents.  Each  of  the 
agents  follows  the  dynamics  given  by  (2),  and  with  feedback  control 
law  given  by  (4).  For  any  initial  condition  belonging  to  the  level  set 
of</>(q.p)  given  by  Oc  =  {(q.p)  *  (i(q.p)  <  C)}  with  C  >  0, 
and  when  the  underlying  graph  of  the  system  is  fully  connected  all  the 
time,  then  the  system  asymptotically  converges  to  the  largest  invariant 
set  in  0/  C  Lie-  The  points  in  the  largest  invariant  set  in  0/  have  a 
velocity  that  is  bounded,  the  velocities  of  all  agents  match,  and  the  total 
potential  of  all  agents  given  by  (9)  approaches  a  local  minimum. 

Proof:  Differentiating  d>(  q,  p )  with  respect  to  time  and  using  (11) 
one  gets 


of  different  types  (rA,.fl)  is  lower  bounded  by  the  parameter  dgB ,  i.e., 

ab  -x  jAB 

T avg  ^  dQ 

Proof:  Since  F/jB  is  a  scalar  quantity,  in  m  dimensional  space, 
(17)  can  be  equivalently  written  into  the  following  m  scalar  equations: 

^  y  =0  r  =  1,2,...  m  (18) 

iEAJEB 

where  qfj  =  (qj  —  ql)  represents  the  distance  along  the  rth  dimen¬ 
sion.  Hence,  qij  =  »/£"£  ((lij)2  ■  Let  us  assume  that  out  of  NaNb 
possible  terms  of  FfjB ,  s  is  the  number  of  terms  for  which  F;AB  >  0 
and  t  (s  +  t  =  NaNb  )  is  the  number  of  terms  for  which  F.,AB  <  0. 
Hence,  (18)  can  be  written  as 


(i(q.p)  P  W(q)  +pFp 

=  pTW(q)  +pT  (— W(q)  -  f.(q!p) 

=  -pri(q)p= -y  y  \\pj  -Piii2  <  o.  (i2) 

*  j 

From  Lasalle’s  Invariance  Principle,  all  solutions  of  the  system  starting 
in  Lie  will  converge  to  the  largest  invariant  set  in  fij  =  {(q,  p)  € 
8c  :  (A(q,  p )  =0)},  and  this  happens  when  the  velocities  of  all  agents 
match.  For  a  detailed  proof  of  this,  please  see  [9],  [10],  Furthermore,  in 
the  steady  state,  the  velocities  of  the  agents  do  not  change,  i.e.,  p  =  0. 
For  a  proof  of  this  result,  see  [15],  [16],  From  (11),  it  follows  that  at 
the  steady  state: 


W(q)  =  0. 


(13) 


Hence,  the  total  potential  of  all  agents  is  locally  minimized. 

The  gradient  in  (13)  for  an  agent  i  €  A  is  given  by  the  equation 


V9iV(q)=  J] 


Ik  'fill  Ik  </.|P 


(ft  ~  ft) 


hi  -  ft  II 


+  E' 

j£B 


1  d  q 

Ik II  Iki -ft  IP 


(ft  -  ft) 

lift  -  ft II 


=  0.  (14) 


Equation  (14)  can  be  rewritten  as 


S  t 

y  FkBql  +  y  F,ABq,  =0  r  =  1,2,...  m.  (19) 

k  =  l  /=] 

Note  that,  for  the  ease  of  notations,  the  subscript  ij  has  been  replaced 
by  subscripts  k  and  l.  If  we  assume:  Ylk—i  FkB  Qk  =  ft  then 
FlABq\  =  -ft. 

Then,  for  the  terms  with  FAB  >  0 


s 

m 

m  /  s  \  ^ 

E\ 

fc=l  \ 

y  (FkAB€f  >  \ 

r=l  \ 

y  y«  . 

r=l  \k= J  / 

Noting  that  q^  = 

=  \/X]r-i  hi)2,  ^e  LHS  of  (20)  becomes 

(20) 


y 


t-=  I  \  r=l 

RHS  of  (20)  becomes 


y(ys<zD2  =  y^ 


AB 

k  m 


(21) 


\  r=  1  \k  —  L 


\  r= 1 


(22) 


Hence,  (20)  can  be  written  as 


where 


y  F,jA(tti  -  ft) + y  fabul  ~  qi) = o 

,je  A,  j-/:i  jeB 


(15) 


y  FABqk  >  C. 

k=l 


Similarly,  for  the  terms  with  FAB  <  0 


(23) 


1  dAA 

lift  -  ft  IP  lift  -  ft  II3. 

1  dAB 

lift  -  ft  ip  “  lift  -  ft  IIs  J '  (16) 

If  we  sum  up  (15)  for  all  i  €  A,  and  noting  that  FtAA  =  FjAA,  then 
we  will  obtain  the  following  equation: 

yyys(ft-ft)=o.  uv) 

ieAjeB 

Equation  (17)  leads  to  the  following  proposition: 

Proposition  4.2:  A  system  of  heterogeneous  swarming  agents  con¬ 
sisting  of  two  types  of  agents  and  following  dynamics  (2)  and  control 
law  (4)  flock  together  such  that  the  average  distance  between  the  agents 


pAA  _ 

pAB  _ 

1  y 


y  F,ABqi<~c.  (24) 

/=  1 

From  (16),  it  is  evident  that  each  term  of  FAB  >  0  would  tend  towards 
zero  when  cIqB  would  be  infinitely  large.  This  means  that  each  cr  (and 
hence  c)  would  tend  towards  zero  when  dg  B  is  made  large  enough.  Fur¬ 
ther,  in  order  to  prove  that  each  term  of  FAB  <  0  would  tend  towards 
zero,  we  make  use  of  the  result  that  every  local  minima  of  the  total  po¬ 
tential  V(q)  is  an  a — lattice  and  vice  versa  (See  lemma  3  of  [  10]).  This 
n — lattice  is  a  spatial  configuration  in  which  inter-agent  distances  be¬ 
tween  all  neighbors  of  the  proximity  graph  (i.e.,  nearest  neighbors)  is 
equal  to  the  value  at  which  the  pairwise  inter-agent  potential  between 
the  neareast  neighbors  is  minimum.  This  means  that  the  nearest  type  B 
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neighbors  of  type  A  agents  would  be  at  distance  dgB,  and  hence  each 
term  of  FAB  <  0  will  in  fact  tend  to  zero.  Hence,  from  (23)  and  (24), 
for  large  values  of  dg  B ,  we  can  write 

y  F^B^  +  E  F’ABqi  ~  °-  (25) 

k—L  1=1 

Using  (16)  and  noting  that  qi  =  dgB  +  x-k  for  FAB  >  0  and  q\  = 
dgB  —  xi  for  FAB  <  0,  where  Xk  and  x\  are  non-negative  numbers, 
(25)  can  be  written  as 


y  — ^ T  -  y  — ^ ^  =  o.  (26) 

hi  ( dAB  +  xk)  (dAB  -  x,) 

Since  {dAB  +  Xkj  >  (dgB  —  xi)  for  any  pair  of  h  and/,  it  is  evident 
that 

S  t 

y.nh  -  y  >  0  (27) 


sxr  —  txc /  >  0  (28) 

where  x c  =  (1/s)  Ylk=i  and  xcr  =  ( 1/t)  X!/=1  xi-  The  average 
distance  between  agents  of  type  A  and  type  B  is  given  by 

r£?g  =  [s  (d.gB  +  Xc'j  + 1  (dgB  -  xct^  J 

=  dgB  4 - ^—(sav  -  txc! )  >  dgB .  (29) 

s+t 

Hence,  we  can  always  choose  dgB  to  be  arbitrarily  large  which  in  turn 
would  make  rABg  arbitrarily  large.  ■ 

It  may  be  noted  that  choosing  an  arbitrarily  large  value  for  dgB  does 
not  make  the  average  distance  between  the  same  type  of  agents,  i.e., 
Tang  an£l  ''a^i ,  arbitrarily  large.  It  is  evident  from  (15)  and  (16)  that 
the  values  of  rAAg  and  rBB  would  depend  on  the  parameters  dg  and 
dBB  as  well.  In  the  proof  for  Proposition  4.2,  we  noted  that  each  term  of 
Fab  (positive  or  negative)  would  tend  towards  zero  when  dgB  would 
be  infinitely  large  which  means  that  the  force  on  each  agent  due  to  po¬ 
tential  from  any  agent  of  the  other  type  would  be  negligible.  Hence,  the 
agents  of  the  same  type  would  aggregate  or  flock  in  the  same  manner 
as  they  would  in  the  absence  of  the  agents  of  the  other  type.  Thus, 
choosing  a  value  of  dgB  to  be  large  enough  would  ensure  that  the  seg¬ 
regation  happens  according  to  condition  (3). 


V.  Simulation  Results  and  Discussions 

Extensive  simulations  were  carried  out  to  verify  the  results  obtained 
in  the  previous  sections.  In  the  simulations,  the  following  parameters 
were  assumed:  dAA  =  dBB  =  3,  and  dgB  =  6.  Fig.  2(a)  shows  the 
configuration  of  a  population  of  20  agents  (10  each  of  types  A  andB)  in 
a  2-D  space  at  different  times  during  the  simulation.  The  agents  started 
off  at  a  random  configuration,  and  the  control  law  given  by  (4)  based 
on  differential  potential  was  applied  to  the  agents.  The  final  configura- 
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Fig.  2.  Simulation  with  20  agents,  (a)  Configurations  of  agents  at  times  T  = 


0  sec  (top  left),  T  =  16T  sec  (top  right),  T  =  333  sec  (bottom  left),  and 


T  =  500  sec  (bottom  right);  (b)  average  distances  between  agents  of  types 
A  and  A  (rAA),  B  and  B  (rBB),  and  A  and  B  (rAB). 

n  avgs ’  v  avg*'  v  avgs 


tion  at  time  T  =  500  sec  shows  that  the  agents  of  types  A  and  B  form 
two  separate  groups.  Fig.  2(b)  shows  the  plot  of  the  average  distances 
between  the  agents  of  types  A  and  A  (rAvg),  B  and  B  (rB,B),  and  A 
and  B  (rABg)  versus  time  for  the  above  simulation.  At  the  final  config¬ 
uration,  the  average  distances  rAAg ,  rBvg ,  and  rABg  were  found  out  to 
be  3.21,  3.22,  and  6.97  respectively,  which  clearly  shows  that  the  pop¬ 
ulation  was  segregated  based  on  the  condition  given  by  expression  (3). 

The  results  given  above  were  for  only  one  simulation  run.  In  order 
to  verify  that  the  method  presented  in  this  technical  note  leads  to  segre¬ 
gation  in  general  in  a  population  of  heterogeneous  agents,  an  extensive 
simulation  study  was  carried  out  in  which  more  than  100  runs  were 
performed.  Fig.  3  shows  the  average  distance  between  agents  at  the 
steady-state.  In  each  of  the  simulation  runs,  the  population  of  agents 
consisting  of  types  A  and  B  was  initialized  in  a  random  configuration 
obtained  via  uniform  distribution  of  agents  in  a  2-D  space,  and  the  num¬ 
bers  of  agents  of  types  A  and  B  were  each  chosen  randomly  between 
the  values  of  5  and  15.  Each  of  the  runs  was  carried  out  for  500  seconds 
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Average  Distances 


Fig.  3.  Average  distance  between  agents. 


of  simulation  time.  The  average  distances  between  the  agents  shown  in 
the  figure  were  calculated  at  the  steady  (final)  state.  It  can  be  easily  seen 
that  the  average  distance  between  agents  of  type  A  (r^)  and  average 
distance  between  agents  of  type  B  (rf^)  is  less  than  the  average  dis¬ 
tance  between  agents  of  type  A  and  B  (r^)  for  each  of  the  simulation 
runs.  Also,  it  is  evident  from  the  figure  that  is  always  greater  than 
the  parameter  supporting  our  result  from  Proposition  4.2. 


VI.  Conclusion 

The  technical  note  presents  a  decentralized  technique  to  achieve  a 
self-organized  behavior  of  segregation  in  a  population  consisting  of 
two  different  types  of  robotic  agents.  Inspired  from  the  mechanisms 
by  which  segregation  is  achieved  in  some  of  the  biological  systems, 
the  method  presented  in  this  technical  note  is  based  on  the  concept 
of  differential  artificial  potential.  In  this  differential  artificial  potential 
function  framework,  agents  experience  different  magnitudes  of  poten¬ 
tial  when  they  are  interacting  with  the  agents  of  different  types.  The 
technical  note  presents  the  stability  analysis  of  a  population  of  agents 
in  a  Lyapunov  framework,  and  lays  down  an  analytical  foundation  for 
synthesis  of  controllers  for  segregation  in  artificial  potential  function 
formulation.  Extensive  simulation  studies  verify  the  results  obtained  in 
this  technical  note,  and  demonstrate  the  effectiveness  of  the  proposed 
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Kalman  Smoother  Based  Force 
Localization  and  Mapping  Using 
Intravital  Video  Microscopy 

Motility  is  an  important  property  of  immune  system  cells.  It  provides  cells  with  the  ability 
to  perform  their  function  not  only  at  the  right  time  but  also  at  the  right  place.  In  this 
paper,  we  introduce  the  problem  of  modeling  and  estimating  an  effective  force  field 
directing  cell  movement  by  the  analysis  of  intravital  video  microscopy.  A  computational 
approach  is  proposed  for  solving  this  problem  without  dealing  with  a  parametrized  spa¬ 
tial  model  of  the  field  in  order  to  avoid  potential  errors  due  to  inaccurate  spatial  model 
assumptions.  We  consider  the  dynamics  of  cells  similar  to  the  dynamics  of  distributed 
agents  typically  used  in  the  field  of  swarm  robotics.  The  method  utilizes  a  fixed-interval 
Kalman  filter  based  smoother.  Its  application  results  in  a  map  giving  the  intensity  and 
direction  of  the  effective  force  field.  The  results  show  that  real-time  video  images  are  a 
source  of  data,  enabling  us  to  visualize  intriguing  spatiotemporal  phenomena  inside 
immune  system  organs.  The  proposed  approach  can  fill  the  existing  gap  between  contem¬ 
porary  technology  and  quantitative  data  analyses  present  in  the  field  of  biosystems. 
[DOI:  10.1115/1.4002485] 


1  Introduction 

The  lymph  nodes  are  secondary  lymphoid  tissues  and  anatomi¬ 
cal  sites,  where,  along  with  the  spleen,  the  major  component  pro¬ 
cesses  of  the  immune  response  to  infection  and  vaccination  occur, 
including  antigen  recognition,  cellular  activation,  differentiation, 
affinity  maturation,  and  the  establishment  of  immune  memory. 
Recent  advances  in  imaging  technology  provide  us  a  novel  insight 
into  the  dynamics  of  immune  cell  interactions  in  intact  lymph 
nodes  [1-3]. 

In  this  paper,  we  introduce  the  problem  of  estimating  the  effec¬ 
tive  force  field  influencing  cell  motility  based  on  intravital  video 
microscopy.  We  name  this  problem  force  localization  and  map¬ 
ping  (FLAM).  Considering  an  individual  cell  as  an  agent  [4],  we 
find  this  problem  similar  to  a  well-studied  problem  in  robotics,  the 
so-called  simultaneous  localization  and  mapping  (SLAM)  [5,6]. 
In  both  problems,  the  estimation  is  based  on  the  individual  agent’s 
motility  model  and  the  outcome  is  a  map  that  visualizes  the  spatial 
structure  within  the  agent’s  environment. 

In  the  immunological  context,  effective  forces  originate  from 
the  mechanical  forces  of  extracellular  environment  and  the  active 
responses  of  cells  to  gradients  of  chemotactic  and  haptotactic  sub¬ 
stances.  Currently,  there  is  no  alternative  microscopy  method 
available  for  measuring  these  forces  under  physiological  condi¬ 
tion.  Therefore,  we  can  only  rely  on  a  computational  method  for 
the  effective  force  field  estimation  from  intravital  video  micros¬ 
copy.  The  estimation  provides  quantitative  information  that  is  im¬ 
portant  for  understanding  the  behavior  of  the  immune  system  as  a 
whole.  Cellular  aggregation  and  lymphoid  tissue  morphology  are 
determined  largely  by  these  fields  and  are  of  great  consequence  in 
the  development  of  an  effective  immune  response. 

Our  emphasis  in  solving  the  FLAM  problem  is  the  development 
of  an  approach  that  is  independent  of  any  assumption  about  the 
force  field  shape.  In  this  way,  we  hope  to  avoid  an  error  in  the 
field  estimation  resulting  from  possibly  wrong  intuition,  trying, 
for  example,  to  match  the  shape  of  the  field  to  visual  patterns  in  a 
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video  sequence.  By  proposing  this  approach,  however,  we  do  not 
exclude  the  possibility  that  after  multiple  field  estimations,  as  in 
the  case  of  repeated  experiments,  the  visual  patterns  can  be  con¬ 
sidered.  This  will  likely  prove  fruitful  once  confidence  in  the  re¬ 
lation  between  the  force  field  shape  and  the  visual  patterns  has 
been  established. 

The  data  and  model  that  we  use  in  our  estimation  approach  are 
described  in  Secs.  2  and  3,  respectively.  Section  4  explains  the 
effective  force  estimation  based  on  the  fixed-interval  optimal 
smoother.  We  discuss  the  limitations  of  this  approach  in  Sec.  5. 
Section  6  describes  the  force  estimation  in  the  case  of  a  time¬ 
stationary  uniform  field.  The  estimation  of  a  nonuniform  time¬ 
stationary  field  is  introduced  in  Sec.  7.  Finally,  Sec.  8  presents  the 
conclusions  based  on  this  research. 

2  Cell  Motility  Data 

In  the  case  of  the  lymph  node,  cells  are  50-250  pm  deep  in 
the  tissue.  The  cell  motility  is  observed  by  two-photon  micros¬ 
copy,  which  minimizes  the  damage  to  cells  and  surrounding  tis¬ 
sues.  Two-photon  microscopy  requires  that  the  observed  cells  be 
labeled  with  fluorescent  dyes  emitting  the  light  of  a  specific  wave¬ 
length  when  excited  by  a  laser  beam.  In  the  case  of  the  experi¬ 
mental  setup  for  lymph-node  imaging  that  we  consider,  the  laser 
beam  typically  sweeps  out  a  two-dimensional  region  of  200 
X  150  pm2  while  its  depth  in  the  tissue  is  varied,  providing  an 
image  stack  of  50  pm  thick  3D  volume  within  the  lymph  node. 
We  refer  to  this  volume  as  the  visual  field. 

Figure  1  shows  a  typical  2D  image  projection  of  the  intra¬ 
lymph  node  image  stack  in  which  T-cells  and  dendritic  cells  are 
labeled  with  two  different  colors  [4].  The  sampling  interval  be¬ 
tween  subsequent  scans  of  the  visual  field  is  in  the  range  from  18s 
to  21s.  The  data  we  derive  from  the  stacks  are  the  cell  position 
samples.  If  each  individual  cell,  i.e.,  its  trajectory,  is  uniquely 
labeled  by  the  integer  number  i,  the  data  we  are  dealing  with  are 
coordinates  of  cell  center  x‘k,  y[,  and  z‘k  corresponding  to  the  time 
points  t'k  along  the  ith  trajectory. 

Measurement  of  cell  positions  from  these  video  images  is  not 
error  free.  To  minimize  errors,  it  is  necessary  to  process  3D  image 
stacks  and  to  compensate  for  distortion  introduced  by  the  optical 
system  and  transmission  of  the  light  through  the  tissue  prior  to 
measuring  the  cell  positions.  Errors  in  the  cell  position  measure- 
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Fig.  1  Computer-generated  image  of  a  cellular  interaction  in¬ 
side  the  lymph  node:  T-cells  (red),  and  dendritic  cells  (green) 

[4] 


ment  are  mainly  to  be  attributed  to  finite  image  resolution  and  the 
significant  and  random  change  in  cell  morphology.  Moreover, 
each  cell  must  be  assigned  to  a  single  trajectory,  a  task  that  may 
be  nontrivial  when  two  cell  trajectories  are  in  close  proximity. 

An  example  of  a  2D  projection  of  trajectories  is  depicted  in  Fig. 
2  in  which  each  trajectory  is  translated  back  to  the  origin  (0,0). 
This  type  of  diagram  has  been  previously  used  [1]  to  detect  the 
presence  of  the  force.  It  is  obvious  that,  while  we  can  use  the  help 
of  such  a  diagram  to  detect  a  constant  force,  it  is  not  useful  if  the 
force  field  is  nonuniform  over  the  visual  field.  For  example,  in  the 
case  of  data  presented  in  Fig.  2,  we  can  see  that  the  force  compo¬ 
nent  along  the  x  axis  is  nonzero  and  may  be  constant  but  it  is 
difficult  to  say  anything  about  v-dependence  of  the  force  field 
based  on  a  figure  of  this  type.  The  cell  trajectories  presented  in 
Fig.  2  are  generated  from  a  realistic  model  of  cell  motility  and 
geometry  of  the  visual  field. 

We  focus  on  the  analysis  of  data  from  B-  and  T-lymphocytes, 
which  have  more  or  less  rounded  shapes.  However,  the  same 
analysis  can  be  also  applied  to  motility  of  irregularly  shaped  cells 
such  as  macrophages,  neutrophils,  and  dendritic  cells.  It  is  only 
necessary  that  the  time-space  scale  of  interest  allows  that  the  cell 
trajectories  are  fairly  well  described  by  the  cell  center  positions. 
Uncertainty  of  the  center  position  due  to  the  cell  shape  irregularity 
can  be  assigned  to  the  intensity  of  measurement  errors. 


Fig.  2  Two-dimensional  projection  of  40  cell  tracks  from  the 
200X150X50  /xm 3  visual  field  provided  by  a  two-photon  video 
microscopy  (model  generated).  Each  track  is  translated  so  that 
the  track  begins  at  the  diagram  origin.  The  cells  can  move  in  all 
directions,  therefore,  the  limits  for  plotting  translated  trajecto¬ 
ries  are  ±200  /urn  and  ±150  /um  along  x  and  y  axes, 
respectively. 


According  to  our  general  model  presented  in  Fig.  3(a),  an  indi¬ 
vidual  cell  senses  the  environment  using  its  receptors  and  pro¬ 
cesses  received  signals  intracellularly.  Intracellularly  generated 
forces  that  act  on  the  cell  structure  arise  as  a  result  of  this  pro¬ 
cessing.  These  forces  along  with  mechanical  forces  act  on  the  cell 
structure,  finally  resulting  in  cell  movement.  While  there  are  at¬ 
tempts  to  understand  chemotaxis  in  more  detail  [7-9],  in  order  to 
estimate  effective  forces  influencing  cell  motility,  we  use  a  gen¬ 
eral  model  without  going  into  details  of  intracellular  signaling  and 
cell  structural  properties.  This  double  integrator  model  represent¬ 
ing  particle  dynamics  [10-12]  is  common  in  representing  the  mo¬ 
tion  of  agents  in  a  multi-agent  systems. 

The  model  describes  motility  of  an  individual  z'th  cell  (agent) 
volume  center.  The  integrator  block  of  the  model  (see  Fig.  3(a)) 
diagram  does  not  describe  the  property  of  the  cell  but  the  natural 
relation  between  cell  velocity  v'  and  its  position  x'.  The  same 
relation  is  presented  by  the  Laplace  domain  transfer  function  1/s 
of  the  model  in  Fig.  3(b). 

Neither  the  velocity  v'  nor  the  cell  position  x'  can  be  measured 
directly.  The  data  we  measure  directly  are  cell  positions  y'  cor¬ 
rupted  by  measurement  errors  ff.  The  error  sources  are  due  to 
limited  image  resolution,  light  diffusion,  and  other  unpredictable 
disturbances  influencing  the  detection  of  the  cell  volume  center.  In 
the  model,  we  assume  that  the  measurement  errors  ff(t)  are  addi¬ 
tive  (see  Fig.  3),  i.e., 

y‘(i)  =  x‘(t)+0(t)  (1) 

In  this  notation,  vectors  x‘,v'  eRd,  where  D  =  3  is  the  spatial  di¬ 
mension.  The  last  equation  is  called  the  observation  model  and  the 
additive  influence  of  errors  6(t)  is  depicted  by  the  summation 
symbol  X  in  Fig.  3(a),  as  well  as  in  the  model  in  Fig.  3(b).  We 
assume  that  the  error  distribution  is  Gaussian  zero  mean,  ff(t) 
~N( 0,®),  and  covariance  ®. 

In  general,  the  cell  motility  velocity  is  a  result  of  mechanical 
forces  of  extracellular  environment  and  the  active  responses  of 
cells  to  gradients  of  chemotactic  and  haptotactic  substances,  i.e., 
extracellular  signals  (see  Fig.  3(a)).  In  the  model  we  deal  with  in 
this  paper  (Fig.  3(b)),  a  persistent  component  of  these  forces  in¬ 
fluencing  the  zth  cell  is  described  by  the  effective  force  vector  f' 
e  Rd.  The  random  force  w'  scaled  with  a  accounts  for  the  sto¬ 
chastic  nature  of  sensed  extracellular  signals  and  complex  me¬ 
chanical  forces  influencing  cell  motility.  We  model  this  force  as  a 
D  dimensional  Gaussian  white  noise  process  of  unit  intensity  W 
=  IDX£>  and  uncorrelated  with  measurement  errors. 

In  our  model,  the  relation  between  the  force  and  velocity  is 
described  by  the  transfer  function  with  the  single  real  pole  -c,  c 
a0.  The  reason  for  this  lies  in  the  fact  that  available  data  present¬ 
ing  the  traveled  cell  distance  versus  the  square  root  of  time  \/r  are 
bent  for  the  small  values  of  the  time,  and  this  is  what  this  simple 
model  predicts  (see  Fig.  4  and  Appendix  A). 
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Square  root  of  time  (nmi  ) 


Fig.  4  Cell  displacements  versus  the  square  root  of  traveled 
time:  parameters  <r=10  and  c=0.1,  1,  10,  and  100 


Only  within  the  limit  of  large  c— >°°  does  the  curvature  of  the 
diagram  vanish,  which  means  that  even  the  highest  frequency 
components  of  the  force  propagate  to  the  velocity.  This  results  in 
the  Brownian  model  of  cell  motility,  which  seems  unrealistic  be¬ 
cause  it  is  expected  that  the  mechanical  structure  of  a  cell  has 
low-pass  frequency  characteristics. 

In  the  time  domain,  our  model  of  the  cell  i  motility  is  given  by 
the  stochastic  differential  equation  with  the  force  term 

x‘(t)  =  v‘(f)  (2) 

v''(f)  =  -  cy\t)  +  rrw'(f)  +  f  (f)  (3) 

We  assume  that  the  force  f'(/)  results  from  a  time-stationary  force 
field,  which  means  that  the  force  sensed  by  the  cell  depends  only 
on  its  position.  Consequently,  we  can  write  f' (?)  =  F(x'(f)),  where 
function  F(x)  maps  the  position  x  to  the  force  vector  and  defines 
the  time-stationary  force  field  that  we  estimate  in  this  work.  In  the 
multi-agent  literature  [12],  the  right-hand  side  of  Eq.  (3)  is  a  con¬ 
trol  of  the  agent  motility,  i.e.,  cell  motility.  The  structure  of  con¬ 
trol  and  stability  analysis  has  been  discussed  in  Ref.  [12],  and  the 
influence  of  stochasticity  term  has  been  discussed  in  Ref.  [13]. 

Let  us  define  the  column  vector  X'^x'VW  i ,  the  so-called 
state  vector.  Now,  we  can  write  the  motility  and  observation 
model  in  the  matrix  form  as 

X'(f)  =  AcX'(r)  +  Lcw'(f)  +  Bcf  (/) 

y'(f)  =  CtX'(/)  +  ff(l)  (4) 

in  which  matrices  Ac,  Lc,  and  Cc,  as  well  as  the  corresponding 
noise  intensity  matrix  W,  can  be  easily  identified  (see  Appendix 
B).  Since  we  are  dealing  with  sampled  data,  it  is  useful  to  intro¬ 
duce  a  discrete-time  model, 

X‘(k  +  1)  =  AdX'(L)  +  Ldw  \k)  +  Bdf  (k) 

yi(k)  =  CAXi(k)  +  ff(k)  (5) 

The  definitions  of  the  corresponding  matrices  Ad  and  Ld  are  pro¬ 
vided  in  Appendix  B.  The  variable  k  corresponds  to  the  time  point 
tk  at  which  the  position  vector  y'  is  sampled  along  the  ith  cell 
trajectory.  We  do  not  provide  here  the  expression  for  the  matrix 
Bd  because  it  depends  on  properties  of  the  continuous  signal  f'(/) 
[14].  Moreover,  in  the  next  section,  we  will  incorporate  the  signal 
f  (t)  into  the  state  vector. 

The  distance  and  time  units  that  we  use  front  here  on  are  mi¬ 
crometers  and  minutes.  In  this  paper,  we  assume  that  the  param¬ 
eters  cr  and  c  of  the  model  are  equal  for  the  same  cell  type.  Based 
on  available  data,  we  take  these  parameters  to  have  the  values  cr 
=  10  and  c=l.  The  intensity  of  the  measurement  error  ®  is  a 
diagonal  matrix  with  diagonal  elements  equal  to  2.52,  correspond¬ 
ing  to  approximately  one-quarter  of  the  expected  cell  diameter  in 
the  case  of  T-  or  B -lymphocytes. 
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4  Force  Model  and  Force  Estimation 

The  force  field  F  that  we  estimate  is  stationary  and  depends 
only  on  the  position  x.  One  way  to  estimate  the  force  field  is  to 
assume  an  analytical  shape  of  the  field,  parametrize  the  shape,  and 
estimate  its  parameters.  The  drawback  of  this  approach  is  that  the 
assumed  form  of  the  field  can  be  unrealistic.  In  the  approach  that 
we  are  presenting  here,  we  avoid  any  kind  of  structural  assump¬ 
tions  about  the  function  F(x)  and  we  exploit  the  information  about 
the  force  field  from  the  cell  trajectory  data. 

Under  the  assumption  that  the  force  field  F(x)  is  stationary  in 
time,  the  force  as  sensed  by  the  ith  cell  is  f'(/)  =  F(x'(f)).  Our 
approach  to  estimate  F(x)  is  to  estimate  first  the  forces  influencing 
cell  motility  along  each  individual  cell  trajectory,  then  use  these 
estimations  to  integrate  a  map  that  visualizes  the  force  field, 
which  is  explained  in  Sec.  7. 

Our  knowledge  about  the  force  f‘(t)  is  minimal  and  trajectory 
dependent.  If  the  field  is  smooth  and  the  cell  motility  is  a  stochas¬ 
tic  process,  the  signal  P(f)  will  be  stochastic  too.  In  the  other  limit, 
if  the  stochastic  motility  component  is  of  small  intensity,  then  f  (?) 
will  be  smooth  and  possibly  well  described  by  a  polynomial  ap¬ 
proximation.  To  include  both  possible  limits,  we  model  f  (?)  as  the 
random  process, 

f‘(t)  =  <f>'Wp(f)  (6) 

The  parameter  crF  scales  the  intensity  of  Gaussian  white  noise 
vector  wF(/),  which  has  the  dimension  D  and  is  uncorrelated  with 
either  w '(/)  or  0{t).  Samples  from  this  stochastic  model,  although 
with  a  small  probability  of  realization,  also  correspond  to  the  va¬ 
riety  of  deterministic  functions,  including  those  with  smooth  force 
variations.  The  intensity  of  the  expected  variations  of  the  force  is 
included  using  the  parameter  crF.  The  force  with  faster  expected 
variations  is  modeled  using  a  larger  crF.  The  force  model  (6)  can 
be  included  into  model  (5),  and  it  results  in  the  difference  equa¬ 
tion 

X'(k  +  1)  =  AX'{k)  +  Lv/'(k) 

y‘(k)  =  CX'(k)  +  ff(k)  (7) 

in  which  the  force  t'(k)  is  incorporated  into  the  augmented  state 
vector  X'  =  [X';f]3flxl  of  the  dimension  3  DX  1  (see  Appendix  B). 

The  optimal  Bayesian  estimation  of  the  state  vector  of  the  dif¬ 
ference  model  such  as  Eq.  (7)  is  a  well-studied  problem  [15].  In 
this  problem,  the  samples  k=  1 ,  k=  2, ,  k=N  are  all  available 
at  the  time  point  at  which  the  state  estimation  is  computed.  This  is 
the  so-called  fixed-interval  smoothing  problem. 

To  avoid  accumulation  of  numerical  errors  and  decrease  the 
amount  of  matrix  inversions,  the  realization  of  the  used  optimal 
smoother  is  in  the  so-called  Rauch-Tung-Striebel  (RTS)  form  of 
the  fixed-interval  optimal  smoother  [15], 

X'(fc)  =  X‘F(k)  +  Kk[X'(L  +  1)  -  Xj,(*  +  1)]  (8) 

Kk  =  P^A'P^ik  +  1 ),  X'(k)  =  XpW  (9) 

In  our  case,  the  computed  state  estimations  X'(k ) 
=  [x'(L) ;  v'(L)  ;f  (k)]  include  the  estimation  of  positions  x'(fc),  ve¬ 
locities  \'(k),  and  forces  i‘{k)  for  all  k=  1, ...  ,N.  The  covariance 
matrix  of  the  state  vector  estimation  is 

P Kk)  =  P‘F(k)  +  Kk[P'(L  +  1)  -  P JT(*  +  l)]Kj  (10) 

with  P'(/V)  =  PF(/V).  The  presented  RTS  formulation  of  the 
smoother  includes  the  forward  iteration,  which  implements  the 
Kalman  filter  for  computing  vectors  XF  and  X'F  and  matrices  PF, 
P^T,  as  well  as  the  backward  iteration  presented  above.  For  details 
on  the  Kalman  filter,  see  Appendix  C. 
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Fig.  5  The  force  estimation  <tf=  0  (top)  and  <rF=0.1  (bottom): 
true  force  (red),  the  forward  (Kalman  filter)  iteration  (green), 
and  the  backward  iteration  (blue).  The  true  force  is  defined  by 
expression  (11). 


Fig.  6  The  force  estimation  o>=1  (top)  and  <rF=  5  (bottom): 
true  force  (red),  the  forward  (Kalman  filter)  iteration  (green), 
and  the  backward  iteration  (blue).  The  true  force  is  defined  by 
expression  (11). 


5  Force  Estimation  Along  the  Trajectory 

Among  all  the  parameters  included  in  the  optimal  smoother,  the 
parameter  oy  can  be  considered  as  the  timing  parameter.  The  fol¬ 
lowing  examples  illustrate  how  this  parameter  influences  the  force 
estimation  and  that  the  stochastic  model  is  general  enough  even  in 
the  case  when  the  f'(r)  is  smooth  and  deterministic. 

Here,  we  generate  the  cell  trajectory  based  on  the  model  (Eq. 
(5))  and  the  force,  which  is 


«*)  = 


5  cos(4^rr/180) 

Jy(k)  . 

5  sin(4kTr/180) 

(ID 


We  are  dealing  here  with  a  single  cell  and  the  index  i  is  omitted 
from  the  notation.  To  make  it  simple,  we  use  here  the  two- 
dimensional  motility  model  D  =  2,  with  parameters  cr=l,  c=l, 
W  =  diag{l ,  1},  and  @  =  diag{2.5z,2.52}. 

To  apply  the  smoother,  we  need  to  take  an  initial  guess  for  the 
state  vector  estimation  X(0);  thus,  we  use 


x(0)=y(0)  and  v(0)  =  — (y(l)  -  y(0))  (12) 

ti  _  to 


because  the  observed  values  y(0)  should  be  close  to  the  true  cell 

position.  For  the  components  of  X(0),  relating  to  the  force  f(0), 
we  use  the  zero  vector.  Following  the  same  reasoning,  we  make 
use  of  the  following  covariance  matrix  of  the  initial  state: 


®2X2 

®2X2 

®2X2 

®2X2 

2© 

®2X2 

(ft  ~  t0)- 

®2X2 

®2X2 

102I2X2 

(13) 


In  the  initial  state  guess  X(0),  the  only  arbitrary  value  is  the  zero 
force  vector.  Therefore,  in  the  covariance  vector  P(0),  we  assume 
that  the  standard  deviation  of  the  force  is  large  and  we  take  value 
10. 

First,  we  consider  the  case  when  oy=0.  The  result  of  the 
smoothing  algorithm  is  presented  in  the  top  panels  of  Fig.  5.  We 
can  see  that,  in  the  forward  iteration,  the  force  estimation  changes 
with  more  data  points  included.  When  the  last  data  point  is  in¬ 
cluded,  the  estimated  force  is  the  best  estimate  based  on  the  as¬ 
sumption  oy=0.  Therefore,  it  is  obvious  why,  in  this  case,  the 
backward  iteration  does  not  improve  the  force  estimation.  The 


smoother  estimate  of  the  force,  for  all  k ,  is  identical  to  the  last 
estimate  of  the  Kalman  filter  in  the  forward  iteration,  which  is  the 
well-known  result  of  the  estimation  theory  [15], 

In  the  bottom  panels  of  Fig.  5,  we  show  the  result  when  the 
parameter  ay  is  slightly  increased.  Now,  oy= 0.1  and  we  can  see 
that  the  estimations  of  the  forward  and  backward  smoother  itera¬ 
tions  are  different.  In  this  case,  the  improvement  gained  by  the 
backward  iteration  is  considerable  if  we  compare  the  phase  shift 
between  the  estimated  force  and  the  true  force  time  signal. 

By  increasing  the  parameter  oy=  1,  we  get  a  better  match  be¬ 
tween  the  true  force  and  the  estimation  in  the  phase  and  amplitude 
(see  Fig.  6,  top).  In  this  case,  ay  is  large  enough  so  that  the 
smoother  can  follow  the  speed  of  the  force  variations.  By  increas¬ 
ing  ay,  the  smoother  is  able  to  follow  even  faster  force  variations. 
However,  not  only  does  a  larger  value  for  ay  extend  the  dynami¬ 
cal  range  for  the  forces  that  the  smoother  can  estimate,  but  it  also 
results  in  a  smaller  attenuation  of  the  noise  propagation  to  the 
force  estimation.  This  is  illustrated  in  the  bottom  panels  of  Fig.  6, 
resulting  from  the  smoother  with  oy=  5.  We  can  notice  that  the 
force  estimation  still  matches  the  amplitude  and  phase  of  the  true 
signal  but  the  estimation  is  noisy. 

These  four  numerical  examples  primarily  illustrate  that  the 
force  estimation  along  the  trajectory  and  based  on  stochastic  force 
models  is  feasible.  They  also  show  that  the  value  of  parameter  ay- 
should  be  small  enough  to  suppress  the  propagation  of  the  noise  to 
the  estimated  force  but  still  large  enough  to  follow  the  force  varia¬ 
tion.  When  the  parameter  value  is  large  enough,  it  influences  the 
noise  of  estimates  and  not  their  expected  values.  Therefore,  we 
see  this  parameter  as  a  smoothing  parameter  of  the  force  field 
visualization  method  that  will  be  explained  in  Sec.  7. 


6  Uniform  Field  Estimation 

The  relation  between  estimation  performance  and  the  cell  mo¬ 
tility  model  parameters  and  dimensions  of  the  visual  field,  as  well 
as  the  direction  and  intensity  of  the  force,  which  is  to  be  esti¬ 
mated,  is  not  simple.  For  example,  one  can  expect  that  the  strong 
force  is  easier  to  identify  and  with  more  confidence  due  to  the 
larger  cell  displacement.  On  the  other  hand,  the  force  field  with  a 
strong  average  value  removes  cells  faster  from  the  visual  field  and 
this  ultimately  results  in  fewer  data  points  that  are  processed  by 
the  estimator.  Fewer  data  points  usually  result  in  a  higher  uncer¬ 
tainty  of  the  force  estimation.  Here,  we  propose  a  method  for  the 
uniform  force  field  estimation  based  on  an  insight  into  the  relation 
of  the  estimation  uncertainty  and  trajectory  lengths. 
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7  Nonuniform  Field  Estimation 


20  40  60  80  100  120  140 

Number  of  samples 


Fig.  7  The  standard  deviation  of  the  constant  force  estimation 
component;  the  fixed-interval  smoother  parameter  <rF=0 


Here,  we  deal  with  the  uniform  force  field,  i.e.,  F(x)  =  F  and 
f'(()=F  for  any  cell  i,  and  the  model  (Eq.  (5))  with  parameters  c 
=  1,  (7=10,  W  =  diag{l  ,1,1},  and  0  =  diag{2.52,2.52,2.52}  that  are 
identified  from  in  vivo  experimental  data.  The  diagram  in  Fig.  7 
shows  the  square  root  of  the  diagonal  terms  [S]n  =  [S]22=[S]33  of 
matrix  S,  which  is  the  submatrix  of  the  Kalman  filter  covariance 
matrix  P  corresponding  only  to  the  force  components,  i.e., 

[S],7  =  [P]2dh2d+3s  U=  1.2,3,  D  =  3  (14) 

All  these  matrices  can  be  computed  based  on  the  model  param¬ 
eters  and  crF=0.  The  diagonal  terms  [S],-,-  and  i  =  1 , 2 , 3  are  equal 
due  to  the  symmetry  of  our  model  regarding  the  change  of  coor¬ 
dinates. 

Based  on  Fig.  7,  we  can  see  that  the  optimal  smoother  with 
parameter  crF=0  (constant  force  assumption)  cannot  shrink  the 
standard  deviation  of  the  force  estimation  below  10%  of  initial 
uncertainity  for  less  than  100  samples  of  the  trajectory.  On  the 
other  hand,  the  cell  trajectory  length  from  experiments  rarely  goes 
over  100  samples.  The  typical  number  of  samples  is  between  15 
and  35.  For  this  reason,  and  in  order  to  increase  the  precision  of 
the  force  estimation,  it  is  beneficial  to  rely  on  the  data  across 
observed  cell  trajectories. 

In  the  case  of  a  constant  force,  estimations  obtained  for  differ¬ 
ent  trajectories  can  be  integrated  based  on  the  maximum  likeli¬ 
hood  approach.  Let  us  assume  that  we  have  applied  the  optimal 
force  estimator  with  crF= 0  to  obtain  the  sequence  of  force  estima¬ 
tions  F‘,  (  =  1,2,...,  and  their  covariance  matrices  S'.  Under  the 
assumption  of  the  linear  model  (Eq.  (5))  and  the  Kalman  filter 
theory  [15],  the  distribution  of  the  estimated  force  is  Gaussian  so 
that  the  joint  likelihood  of  the  estimated  values  is  (see  Appendix 
D) 


£  =  n  A/"(F‘  -  F,S‘)  (15) 

(=1,2,. . . 

where  A/”(-,S')  is  a  Gaussian  distribution  with  a  zero  vector  mean 

value  and  covariance  matrix  S'.  The  vector  F  denotes  the  expected 
value  of  the  constant  force  based  on  data  from  all  cell  trajectories. 
This  expected  value  can  be  found  by  minimizing  the  log  of  like¬ 
lihood  C ,  which  results  in  the  weighted  average 


2  [S']-1 

-1 

E  [S']-1#' 

_  1=1,2,... 

We  can  use  the  optimal  force  estimator  with  crF=0  and  the 
above  expression  when  we  expect  a  uniform  force  field  inside  the 
visual  field.  However,  we  use  the  same  expression  for  the  estima¬ 
tion  of  a  nonuniform  force  field.  We  will  discuss  this  in  the  fol¬ 
lowing  section. 


For  the  estimation  of  the  force  field  F(x)  that  influences  cell 
motility,  we  propose  a  two-step  strategy:  first,  to  estimate  forces 
t'(k)  that  influence  the  cell  motility  at  each  observed  cell  position 
and  second,  to  integrate  all  estimated  values  into  a  map  that  rep¬ 
resents  the  force  field,  taking  into  account  that  the  field  F(x)  is 
stationary  in  time. 

From  the  results  of  the  previous  sections,  we  can  see  that  we 
should  assume  the  intensity  of  the  variation  of  the  forces  f'(/r)  and 
then  properly  tune  the  smoothing  parameter  crF.  To  improve  the 
consistency  of  our  estimation,  instead  of  starting  from  f'  (k)  as  zero 

vectors,  we  estimate  first  the  force  F  in  the  same  way  that  we 
estimate  the  constant  force  in  the  previous  section.  Then,  we  use 
this  vector  as  an  initial  guess  f'(0)  for  the  estimation  of  the  force 
along  the  ;th  cell  trajectory  at  the  Lth  time  point  f'(fc).  The  moti¬ 
vation  for  this  comes  from  the  fact  that  the  force  F  is  the  best 
constant  force  that  can  describe  all  cell  trajectories;  thus,  it  is 

sensible  to  use  it  as  an  initial  guess  for  f'(0)  and  all  (  =  1,2,.... 
With  this  guess  and  the  smoothing  algorithm,  we  can  compute  the 
estimates  of  the  forces  f'(L)  at  the  cell  positions  x'(k)  that  we  use 
in  the  second  step. 

Each  estimate  F(k)  corresponds  to  the  cell  position  x'(L);  in 
other  words,  we  make  identification  that  the  force  field  estimation 
F(x'(L))  =  f'(L).  However,  to  build  a  consistent  estimation  of  the 
force  field,  we  also  assume  that  each  cell  is  influenced  by  the 
same  force  at  the  same  position  in  the  space.  Taking  this  into 
account,  we  can  say  that  for  x;(fem)  *=  x'(fc„)  and  obviously,  km 

+  kn,  we  have  F(x*(km))  ~F(x'(kn)). 

To  estimate  the  field,  we  divide  our  visual  field  into  R  rectan¬ 
gular  regions.  We  denote  each  of  these  regions  as  fl,.,  with  r 
=  1,2,...,/?  being  the  unique  index  of  the  region.  Moreover,  we 
define  the  set  In(fl,.)  as 


In(H,.)  =  {(/,*)  |x'(fe)  e  flr]  (17) 


Then,  the  force  field  F(x)  for  each  region  r,  i.e.,  for*  e  Fl,.,  can  be 
estimated  as  the  weighted  average 


F(x  e  Fl,)  = 

E  [s'wr1 

-1 

2  [s m-'Hk) 

_  (i,k)  e  In(flr) 

_  ( i,k )  e  In(fl,.) 

(18) 


This  expression  is  identical  to  Eq.  (16),  except  for  the  summation 
and  product  that  go  over  all  the  couples  ( i,k )  from  the  set  In(fl,.) 
of  the  specific  region  Fl,.  (see  Appendix  D). 

We  apply  the  described  approach  for  the  estimation  of  the  force 
field  to  40  cell  trajectories  generated  from  the  model  (Eq.  (5)), 
(7=10,  c=l,  W  =  diag{l ,  1 , 1},  and  0  =  diag{2.52,2.52,2.52},  with 
the  same  geometrical  constraints  as  in  the  previous  section.  In  the 
example,  we  divide  the  visual  field  into  /?=100  rectangular  re¬ 
gions  of  20  X  15  fjum2  and  generate  the  data  assuming  the  force 
field 


5 


F(x)  = 


5  sinjrrx/lOO) 


0 


(19) 


To  include  the  fact  that  experimental  data  usually  neglect  short 
cell  trajectories,  we  generated  200  cell  trajectories  and  then  ran¬ 
domly  chose  40  cell  trajectories,  which  had  more  than  16  points. 
Samples  of  the  generated  tracks  are  depicted  in  Fig.  2  in  such  a 
way  that  the  first  point  of  each  trajectory  is  translated  to  the  origin 
(0,0). 

The  result  of  our  approach  to  force  field  estimation  is  presented 

in  Fig.  8.  The  red  arrows  represent  the  force  F  used  as  the  initial 
condition  for  the  estimation  of  f'(L).  To  compute  the  field  estimate 


Journal  of  Dynamic  Systems,  Measurement,  and  Control 


NOVEMBER  2010,  Vol.  132  /  061503-5 


Downloaded  24  Oct  2012  to  152.3.194.9.  Redistribution  subject  to  ASME  license  or  copyright;  see  http://www.asme.org/terms/Terms_Use.cfm 


160 

140 

120 

|  100 
u 

g  80 
t3 

§  60 
^  40 

20 

0 


Fig.  8  Result  of  the  force  field  estimation:  the  cell  trajectories 
(green),  the  constant  force  F  (red  arrows),  the  smoother  esti¬ 
mation  F  (blue  thick  arrows),  and  the  interval  smoother  param¬ 
eter  <rF=0.1.  The  true  field  is  defined  by  expression  (19). 
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F,  we  use  a  decreasing  sequence  of  parameter  crF,  starting  from 
considerable  large  aF=  10,  to  obtain  a  series  of  estimations.  When 
the  region  f lr  contains  less  than  three  trajectory  points,  we  skip 
averaging  the  forces  estimated  in  the  region,  assuming  that  we 
have  little  data  for  averaging.  For  all  regions  ft,.  with  more  than 

three  trajectory  points,  we  compute  the  average  F  and  depict  it 
with  a  blue  thick  arrow.  The  largest  value  of  aF  for  which  the 
estimations  are  consistent  across  the  data  generated  from  different 
simulation  runs  is  crf=0.1.  The  blue  arrows  visualize  the  esti¬ 
mated  force  field  from  one  of  the  simulation  generated  data  sets. 

The  quality  of  the  field  estimation  depends  on  cell  trajectories; 
generally,  the  more  recorded  trajectories  in  the  square  region  ftr, 
the  more  estimates  per  region  available  for  averaging.  In  our  case, 
the  force  component  along  the  x  axis  is  constant  5  (see  Eq.  (19)). 
The  y  component  changes  from  0  at  points  with  x=0  to  5  at  points 
x=50  and  then  back  to  0  at  points  x=100.  Then,  this  force  com¬ 
ponent  changes  its  sign  and  becomes  negative,  —5,  at  points  with 
x=150  and  again  0  at  x  =  200.  The  force  estimations  that  we  ac¬ 
quired  in  Fig.  8  follow  the  above-described  changes  and  give  an 
insight  into  the  field  shape,  although  without  great  precision.  The 

constant  force  F  (red  arrows),  which  estimates  the  average  com¬ 
ponent  of  forces  in  the  visual  field,  is  reliably  estimated. 


8  Conclusions 

Following  a  technological  breakthrough  in  imaging,  we  are 
now  able  to  observe  the  evolution  of  in  vivo  immune  system 
interactions.  In  this  work,  we  have  developed  an  approach  for  the 
estimation  of  the  force  field  influencing  cell  motility.  For  this 
approach,  we  used  the  optimal  smoother  estimator  of  the  forces 
and  formed  a  map  that  visualizes  the  force  field  based  on  its 
estimations.  We  did  not  impose  any  analytical  constraints  on  the 
force  field.  We  introduced  a  spatial  correlation  between  the  forces 
by  integrating  estimations  over  a  grid  of  rectangular  regions.  Con¬ 
sequently,  the  force  field  is  estimated  and  visualized  by  the  aver¬ 
age  forces  assigned  to  each  rectangular  region. 

Using  the  proposed  method,  we  are  confident  to  estimate  the 
intensity  and  direction  of  the  major  constant  component  of  the 
force  field.  We  also  find  that  the  directions  of  the  force  field  are 
satisfactorily  identified.  In  order  to  apply  the  method  for  the 
analysis  of  imaging  data  routinely,  the  rapid  adjustment  of  the 
estimator  parameters  is  important.  For  this  reason,  we  need  further 
consideration  of  real  data,  including  data  collected  from  different 
immune  system  interactions  and  different  organs. 

Estimation  of  the  forces  that  influence  motility  of  immune  sys¬ 
tem  cells  provides  an  additional  powerful  tool  for  the  analysis  of 
two-photon  microscopic  data.  Our  results  show  that  real-time 


video  images  are  sources  of  data,  enabling  visualization,  which 
can  be  used  for  studying  intriguing  spatiotemporal  phenomena 
inside  immune  system  organs.  We  hope  that  our  approach  and 
similar  approaches  can  fill  the  existing  gap  between  contemporary 
technology  and  quantitative  data  analyses  present  in  the  field. 
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Appendix  A 

For  the  illustration,  we  will  derive  the  cell  displacement  for  the 
case  D  =  2.  In  the  absence  of  the  force  f‘=0,  we  can  derive  that  the 
covariance  of  the  x  and  y  components  of  the  position  is 

2u(t)  =  %yy(t)  =  2(r)  =  ^ 

(Al) 

The  probability  density  function  of  the  position  (x,y)  is  Gaussian 
and  by  introducing  the  radial  displacement  S=yx2+y2,  we  find 
that  the  probability  density  function  of  the  displacement  S  and  the 
corresponding  mean  value  are 

p(S)  =  E{S(t)}  =  (A2) 

The  mean  value  scales  with  yt  for  t— 


Appendix  B 

For  the  motility  model  in  two  (D  =  2)  or  three  (D  =  3)  dimen¬ 
sions,  the  matrices  are 
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Let  us  introduce  the  transition  matrix 

<!>(/*,/)  =  A('*~'> 

Then,  the  state  covariance  of  the  continuous  model 
X  =  ACX  +  Lcw(f) 

Y  =  CcX  +  (9(0 

sampled  at  regularly  sampled  time  instants  tb  h+\ “4=A  is  [15] 
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Pk+i  =  ®(4+b4)P*d*(4+t,4)  +  I  d>(4+i ,  t)LcWL'  <b(tk+x,t)‘ dt 

(B5) 

Therfore,  the  regularly  sampled  continuous-time  stochastic  model 
(Eq.  (B3)),  with  constant  matrices  Ac,  Lc,  and  W=I  being  the 
covariance  of  the  vector  w,  can  be  represented  as  the  discrete-time 
stochastic  process 

X(k  +  1)  =  AdX(0  +  Ldw  (B6) 


Y  (k)  =  C  dX(k)  +  6(k) 


(B7) 


where 

Ad  =  AA,  LdLd  =  f  AALC(AAL c)Tdt,  Cd  =  Cc  (B8) 

Jo 

To  compute  samples  of  the  model  (Eqs.  (B6)  and  (B7)),  the 
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Kalman  filter,  and  the  smoother,  we  only  need  the  matrix  LdLd.  matrix  Ld. 

Therefore,  we  do  not  consider  the  problem  of  computing  the  In  the  case  of  matrices  (Eq.  (B8)),  we  have 
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In  the  case  of  the  augmented  state  discrete-time  model  (Eq.  (7)), 
the  matrices  A  and  L  are  defined  by  expression  (B8),  where  A 
=Ad,  L=Ld,  C  =  [Cc  0OXD]  and 
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Appendix  C 

The  Kalman  filter  iteration  is  described  by  the  following  equa¬ 
tions  [15]: 


Xi-(k+\)  =  AXiF(k) 
P‘p(k+  1)  =  APF(/fc)Az’+L7L 


(Cl) 

(C2) 


X‘(D  =  X$(k)  +  KKF[y‘(D  -  CXjTW]  (C3) 

Pif(D  =  [I-KKFC]P‘-(fe)  (C4) 

KKf  =  Pf(DC7[CPp(DC7  +  0]-1  (C5) 

The  first  equation  (Eq.  (Cl))  is  usually  referred  as  the  prediction 
step  because  it  computes  the  model  base  prediction  for  the  follow¬ 
ing  time  point.  Similarly,  Eq.  (C2)  is  the  model  based  prediction 
for  the  covariance  matrices.  Both  of  these  values  are  computed 
starting  from  the  initial  values  of  XF(0)  and  PF(0).  Equation  (C3) 
is  the  so-called  update  step,  because  it  computes  the  optimal  esti¬ 
mation  X'(k)  based  on  the  prediction  X'~(k)  and  observed  data 
y'(fc).  The  Kalman  filter  gain  Kkf  depends  on  the  prediction  co- 
variance  matrix  P^T(/:),  which  is  the  covariance  matrix  of  the  es¬ 
timated  state  vector  X^T(fc)- 


Appendix  D 

In  the  case  of  the  uniform  field  estimation,  the  estimations  F' 
and  FJ,  i  j ,  resulting  from  trajectories  of  two  different  cells  can 
be  considered  as  two  independent  estimates  of  the  force  F.  Be¬ 
cause  of  that,  their  joint  probability  density  function  is  a  product 
of  their  individual  probability  density  functions,  which  are  Gauss¬ 
ian  according  to  the  theory  [15].  Thus,  we  can  say  that  the  likeli¬ 
hood  of  estimations  from  all  available  cell  trajectories  is 


£  =  n  ATF‘-F,S') 


(Dl) 


where  F  is  the  force  F  expected  value.  Maximization  of  this  like¬ 
lihood  in  order  to  find  the  estimation  F  is  equivalent  to  the  mini¬ 
mization  of  its  negative  likelihood 

-log£  =  ^  E  (F'  -  F)7[S']_i(F‘  -  F)  +  C  (D2) 

(=1,2,... 

where  C  includes  all  terms  independent  on  F.  Derivatives  of  this 
expression  regarding  F  are  equal  to  zero  at  the  minimum,  i.e., 
E,-=i  2,. ..[S']-1(F'-F)  =  0,  and  from  this  we  can  conclude  that 


F  = 


E  [sT1 

-1 

E  [stT1' 

i=l,2,. . . 

.1=1,2,... 

(D3) 


In  the  case  of  the  time-stationary  nonuniform  field  F(x),  the  force 
estimations  along  the  two  different  trajectories  f l(kn)  and  P(km), 
i  i=  j  are  independent  random  Gaussian  vectors.  Assuming  that 
F(x)  is  constant  over  the  square  region  Plr  and  neglecting  the 
correlation  among  estimates  from  the  same  cell  trajectory,  i.e., 
between  f'(km)  and  t‘(kn),  km=f=kn,  we  can  write  the  likelihood  of 
the  force  inside  the  region  F(xe  flr)  as 
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C=  n  Mf'W-F(x  e  llr),S‘(*))  (D4) 

(i,/:)eln!!r 

where  F(x  e  fi)  is  the  expected  value  of  the  force  in  square  region 
ft,.,  and  Injj  is  defined  by  Eq.  (17).  Using  this  likelihood  and 
applying  steps  (D1)-(D3),  we  can  derive  Eq.  (18). 
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A.  1.3  Spill  Detection  and  Perimeter  Surveillance  via  Dis¬ 
tributed  Swarming  Agents 

The  following  paper  (on  the  next  9  pages)  was  co-authored  by  Guoxian  Zhang, 
Gregory  K.  Fricke,  and  Devendra  P.  Garg,  and  was  first  published  online  Septem¬ 
ber  2011  in  IEEE/ASME  Transactions  on  Mechatronics.  The  print  version  will 
appear  February  2013  in  Volume  18,  Number  1,  pages  121-129. 
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Spill  Detection  and  Perimeter  Surveillance  via 
Distributed  Swarming  Agents 

Guoxian  Zhang,  Gregory  K.  Fricke,  and  Devendra  P.  Garg 


Abstract — The  problem  of  perimeter  detection  and  monitoring 
has  a  variety  of  applications.  In  this  paper,  a  hybrid  system  of  finite 
states  is  proposed  for  multiple  autonomous  robotic  agents  with  the 
purpose  of  hazardous  spill  perimeter  detection  and  tracking.  In  the 
system,  each  robotic  agent  is  assumed  to  be  in  one  of  three  states: 
searching,  pursuing,  and  tracking.  The  agents  are  prioritized  based 
on  their  states,  and  a  potential  field  is  constructed  for  agents  in  each 
state.  For  an  agent  in  the  tracking  state,  the  agent’s  location  and 
velocity  as  well  as  those  of  its  closest  leading  and  trailing  agents  are 
utilized  to  control  its  movement.  The  convergence  of  the  tracking 
algorithm  is  analyzed  for  multiple  spills  under  certain  conditions. 
Simulation  and  experiment  results  show  that  with  the  proposed 
method,  the  agents  can  successfully  detect  and  track  the  spills  of 
various  shapes,  sizes,  and  movements. 

Index  Terms — Autonomous  agents,  mobile  agents,  multirobot 
systems. 


I.  Introduction 

THE  development  of  swarm  robotics  has  emerged  as  a  tool 
for  mobile  sensor  networks  in  a  variety  of  areas,  such 
as  environment  monitoring  [1],  foraging  [2],  [3],  target  detec¬ 
tion  [4],  and  target  tracking  [5].  Two  different  kinds  of  con¬ 
trol  (centralized  and  decentralized)  can  be  implemented  for  the 
robot  group.  In  centralized  control,  all  agents  in  the  group  are 
assumed  to  be  able  to  share  their  information  [6].  This  kind 
of  control  suffers  from  computational  burden  if  the  number  of 
agents  is  relatively  large.  More  often,  decentralized  control  is 
used  for  swarm  robotics  with  information  shared  only  among 
agents  within  a  local  network  [1],  [7] — [1 1  ].  In  this  paper,  we 
propose  a  method  for  the  problem  of  perimeter  detection  and 
tracking  via  swarm  robots,  which  has  a  variety  of  applications 
such  as  forest  fire  surveillance  [12],  oil  leakage  tracking  [13], 
and  animal  herd  monitoring. 

Some  of  the  previous  studies  in  this  area  have  been  reported  in 
literature  as  follows.  Casbeer  etal.  [12]  proposed  a  decentralized 
multiple  unmanned  aerial  vehicle  (UAV)  approach  to  monitor 
a  forest  fire.  Each  UAV  flew  along  the  perimeter  of  the  fire  in 
one  direction,  and  then,  reversed  direction  upon  meeting  another 


Manuscript  received  September  30,  2010;  revised  March  9,  2011;  accepted 
June  27,  201 1.  Date  of  publication  September  12,  201 1;  date  of  current  version 
September  12,  2012.  Recommended  by  Technical  Editor  R  X.  Liu.  This  work 
was  supported  in  part  by  the  Army  Research  Office  under  Grant  W91 1NF-08- 
1-0106  and  Grant  W91  INF-09-0307. 

G.  Zhang  is  with  Microstrategy,  Inc.,  Tysons  Comer,  VA  22182  USA  (e-mail: 
zhangguoxian@gmail.com). 

G.  K.  Fricke  and  D.  P.  Garg  are  with  the  Pratt  School  of  Engineer¬ 
ing,  Duke  University,  Durham,  NC  27708  USA  (e-mail:  gkf4@duke.edu; 
dpgarg  @  duke .  edu) . 

Digital  Object  Identifier  10.1 109/TMECH.20 11.2164578 


UAV.  This  movement  continued  and  the  movement  of  the  UAVs 
formed  a  latency.  This  method  could  easily  adapt  to  a  change 
in  the  fire  size  and  the  number  of  UAVs  needed  during  the 
process.  Clark  and  Fierro  [13]  and  Cruz  et  al.  [14]  utilized  a 
hybrid  control  algorithm  [15]  to  achieve  the  perimeter  search 
and  monitoring  in  an  unknown  environment;  this  method  is 
susceptible  to  cases  in  which  a  single  perimeter  would  split  into 
multiple  distinct  perimeters.  Bruemmer  et  al.  [16]  utilized  the 
social  potential  field  generated  by  different  kinds  of  sensors  on 
each  agent  to  avoid  collision  and  attract  agents  to  the  perimeter. 

Related  research  in  multiple  robots  is  in  formation  control. 
Olfati-Saber  [17],  [18]  proposed  a  method  to  control  a  flock 
while  maintaining  its  geometric  formation  along  the  way.  A 
potential  function  achieved  local  minima  when  the  distances 
between  each  pair  of  agents  in  the  flock  reached  a  preset  value. 
The  stability  of  flocking  of  a  group  with  fixed  and  dynamic 
topologies  was  studied  by  Tanner  et  al.  in  [  1 9]  and  [20] .  A  social 
potential  field  was  used  to  control  the  distances  between  agents, 
and  each  agent’s  velocity  was  controlled  by  the  difference  in 
the  velocity  from  its  neighbors.  In  these  cases,  the  distance 
between  each  pair  of  agents  converged  to  a  preset  value  that  is 
not  suitable  in  cases  where  the  desired  distance  between  two 
neighbors  changes  with  time. 

In  this  paper,  inspired  by  flocking  agents,  a  control  law  is 
proposed  for  a  group  of  agents  whose  purpose  is  to  search  for, 
detect,  and  track  a  hazardous  spill  in  an  unknown  environment. 
Each  agent  is  assumed  to  have  limited  and  specific  sensing 
and  communication  ranges.  It  is  shown  that  the  group  can  suc¬ 
cessfully  detect  the  spill  and  track  its  boundary  when  the  spill’s 
location  and  size  are  changing.  Simulation  results  showed  excel¬ 
lent  performance  in  tracking  multiple  spills  that  come  together 
or  split  during  the  process.  The  control  algorithm  proposed  in 
Section  III  is  modified  to  fit  the  requirement  of  real  experiments 
and  the  performance  is  verified  with  a  group  of  robotic  agents. 

II.  Problem  Formulation 

A  group  of  autonomous  mobile  agents  Ai,  A2,  ■  ■  .,  An  is  as¬ 
sumed  to  be  initially  deployed  in  the  area  W.  At  time  t,  the  con¬ 
figuration  of  agent  i  is  represented  as  q, (t)  =  [r i(t)T ,  9i(t)]T , 
where  r,  (f)  =  [x,  (f),  y,  ( t)]T  denotes  the  position  of  A,  at  time 
t  and  6i(t)  denotes  its  orientation.  Each  agent  is  assumed  to 
be  equipped  with  a  sensor  whose  field  of  view  (FOV)  is  S,  , 
and  communication  range  is  Af,.  For  simplicity’s  sake,  in  our 
paper,  we  assume  that  both  <S,  and  AT,  are  circles  whose  cen¬ 
ters  are  located  at  r,  (f),  and  are  of  constant  radii  r$  and  r_vi  ■ 
Similar  assumptions  are  commonly  used  by  other  researchers 
(for  example,  see  [17]  and  [21]).  A  number  of  spills  f lj  (t)  with 
the  boundary  0Qj(t)  exist  in  W.  A  spill  Slj(t)  is  defined  as  a 
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connected  subspace  in  E2 .  Each  agent  is  assumed  to  be  able 
to  measure  the  length  of  the  spill  boundary  within  its  FOV. 
This  length  may  be  computed  in  a  variety  of  ways.  One  way 
is  to  utilize  pattern  classification  technology  to  detect  the  spill 
boundary  from  a  visual  sensor  as  implemented  in  [22].  Then, 
the  detected  spill  boundary  is  represented  by  a  series  of  points 
on  it  and  a  straight  line  segment  is  generated  with  each  pair  of 
adjacent  points.  The  spill  boundary  length  is  approximated  by 
summing  the  lengths  of  all  the  straight  line  segments.  When 
d  flj  ( t )  changes,  it  would  be  beneficial  that  the  agent  group  can 
uniformly  distribute  along  the  spill  boundary  since  the  change 
of  a  part  on  dflj  (t)  could  be  detected  by  an  agent  quickly. 

A  method  is  proposed  to  control  the  movement  of  each  agent 
tracking  a  spill.  Only  the  positions  and  linear  speeds  of  an  agent’s 
closest  leading  and  trailing  neighbors  along  its  way  may  be 
utilized  to  generate  a  potential  field  to  control  its  movement. 
An  influence  distance  along  the  spill  boundary,  denoted  by  L, 
controls  how  the  closest  neighbor  affects  an  agent’s  potential 
field.  This  assumption  guarantees  the  scalability  of  the  proposed 
method  and  will  be  discussed  in  Section  III. 

The  problem  can  be  formulated  as  follows:  given  a  group 
of  agents  A\ ,  A2 ,  ■  ■  ■ ,  An  and  their  initial  configurations 
qi  (0),  q2(0), . . . ,  q„  (0),  find  the  path  for  each  agent  i,  where 
i  =  1,  2, . . . ,  n,  such  that  Ve  >  0,  there  exists  a  time  T  >  0 
and  a  spill  index  j,  for  t>T  agent  i  tracks  f and 
1 1  ri  (*)  >  rJV  (rs  ( t ) )  (*)  1 1 90  j  =Lj(t )  satisfies 


I  Lj(t) 


nj  (t) 


<  e, 


Lj(t)  >=  L  -  e, 


if  nj(t)  > 


if  nj  (f )  < 


mm 

L 

L 


(1) 


Here,  \\a,  b\\gc  is  the  distance  from  a  to  b  along  the  curve  dc  in  the 
forward  direction;  N(ji  (t))  is  the  index  of  the  closest  neighbor 
of  Aj  in  the  forward  direction;  rij  (t)  is  the  number  of  agents  that 
are  tracking  f lj  at  time  t ;  and  \\dilj  (f)  ||  is  the  length  of  d£lj  ( t ). 
Furthermore,  the  velocities  of  agents  tracking  the  same  spill 
converge  to  the  same  value.  Mathematically  speaking,  assume 
that  agents  i  and  j  are  tracking  the  same  spill,  Ve  >  0,  3 T  >  0, 
such  that  Vf  >  T,  we  have  | Vi(t)  —  Vj(t)\  <  e.  When  there  is 
no  ambiguity,  we  simplify  each  variable  without  writing  time 
explicitly;  for  example,  dflj  represents  dflj(t). 

The  communication  among  the  mobile  agents  can  be  repre¬ 
sented  as  a  set  of  connected  undirected  graphs,  each  of  which 
can  be  defined  as  Qk  (14  ,£k),  where  Vk  =  \ah  ,  ah  , . . . ,  } 
are  the  agents  in  Qk,  and  £k  =  {(a;,. ,  )  £  Vk  x  Vk  \ ati ,  £ 
14 .  a/  .  £  Mi  }.  The  information,  such  as  a  detected  point  of  the 
spill  boundary  by  an  agent,  is  assumed  to  be  able  to  be  shared 
within  the  graph  with  no  time  delay.  A  typical  example  of  the 
agent  system  with  ten  agents  and  one  spill  is  shown  in  Fig.  1. 


III.  Solution 

A  hybrid  hierarchical  control  technique  is  used  to  control  the 
agents.  Three  behaviors  are  used  for  perimeter  detection  and 
tracking:  searching,  pursuing,  and  tracking.  An  agent  switches 
among  these  states  based  on  the  agent  and  its  local  group 
members’  situations.  The  algorithmic  state  transitions  for  an 


Spill  lost  by  all  members 
of  local  graph 


Searching 

Spill  detected  by  local 
sensors 
OR 

Member  of  local  graph 
is  in  Pursuing  or 
Tracking  state 


Pursuing 


Spill  is  not  within 
tracking  range  of 
local  sensors 


Spill  is  within 
tracking  range  of 
local  sensors 


Tracking 


Fig.  2.  State  transition  diagram. 


individual  robot  are  shown  in  Fig.  2.  This  hybrid  hierarchy  is 
consistent  with  the  embedded  software  principles  of  Alur  et 
al.  [15].  The  system  described  herein  exhibits  architectural  hi¬ 
erarchy  in  its  basic  construction;  the  controller  employs  behav¬ 
ioral  hierarchy  in  its  state  transitions.  In  contrast  to  the  hybrid 
controller  of  Clark  and  Fierro  [13]  and  Cruz  et  al.  [14],  the 
controller  presented  here  requires  only  a  single  tier  of  mode 
switching  with  no  submodes  or  substates  for  obstacle  or  vehicle 
avoidance.  The  potential  functions,  developed  next,  allow  a  con¬ 
tinuous  calculation  and  application  of  control  authority  that  is 
consistent  with  the  goals  of  each  state  and  yet  robust  to  collision 
avoidance. 

The  agents  are  assumed  to  have  a  unicycle  model  in  which 

'  ±i  =  Vi  cos  Oi 
Vi  =  Vi  sin  6i 

(2) 

Vi  =  Oj 
, 0i  =  uJi 

where  Vi  is  the  linear  velocity  of  the  /th  agent,  a;  is  its  linear 
acceleration,  and  u>i  is  its  angular  velocity.  The  control  inputs 
of  each  agent  are  set  up  as  [ut  to*],  where  Ui  =  ai  and  Wi  =  uiz. 
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A.  Perimeter  Search 


In  the  proposed  method,  the  robotic  agents  are  initially  ran¬ 
domly  deployed  in  an  unknown  environment.  The  geometry  of 
each  agent  is  ignored  in  the  system,  i.e.,  two  agents  collide  only 
when  their  positions  are  the  same.  To  avoid  collision,  we  assume 
that  there  is  a  repulsive  potential  field  generated  by  each  agent, 
denoted  as  L}  [23] 


Ui(  r) 


211  (p(r,  r,)  p0) 

0, 


if  p(r,r<)  <  p0 
if  p(r,r.i)  >  po 


where  rj\  is  a  scaling  parameter,  po  is  the  influence  distance 
of  each  agent,  r,  is  the  *th  agent  position,  and  p(r.  r, )  is  the 
distance  between  r  and  r j . 

The  agents  in  different  states  are  assumed  to  have  different 
priority  levels  for  avoidance.  The  agents  in  the  searching  state 
are  assumed  to  be  at  the  lowest  level,  and  the  agents  in  the 
tracking  state  are  at  the  highest  level.  Let  the  index  sets  Is,  Ip, 
and  It  represent  the  indices  of  agents  in  searching,  pursuing, 
and  tracking  states,  respectively.  Assume  that  an  agent  in  the 
searching  state  is  at  r,  and  let  Ns  (r)  =  {i\\/i,  r  £  AT,:}  denote 
the  indices  of  agents  in  which  communication  range  r  lies.  When 
Ns  (r)  yl  0,  the  following  potential  field  is  constructed  for  the 
agent: 

Us{r)  =  ^2  ui(r)  (4) 

ieN,  (r) 

and  the  force  implemented  on  the  agent  is  computed  as 


Ai 


Fig.  3.  Location,  forward  movement  direction,  and  distance  between  adjacent 
mobile  agents  along  the  spill  perimeter. 


Ud(r)  is  constructed  for  the  agent  moving  toward  rd.  It  is  given 
by 

Ud(r)  =  ^V2p{r,rd)2  (7) 

where  772  is  a  scaling  parameter  and  p(r.  r,/)  is  the  distance 
between  r  and  r^.  Let  Np (r)  =  (j|Vi  £  Ip  U  It,  r  £  AT,:}  de¬ 
note  the  set  of  indices  of  agents  in  pursuing  or  tracking  state, 
in  which  communication  range  r  lies.  The  following  potential 
field  is  constructed  for  the  agent: 

Up{r)  =  Ud(r)  +  ^2  Ui(v)-  (8) 

ieNp  (r) 


Fs(r)  =  — VI/s(r).  (5) 

Here,  ||Fs(r)||  is  utilized  as  control  u,  for  the  agent,  and  cu, 
is  set  at  the  value  with  which  the  orientation  of  the  agent  in 
the  next  time  step  is  consistent  with  the  direction  of  Fs  (r) . 
Since  the  travel  distance  and  search  time  are  not  considered 
in  the  objective  of  our  problem,  when  Ns  (r)  =  0,  a  random 
search  technique  in  discrete  time  with  the  input  [<(&),<(&)] 
is  utilized  to  navigate  the  agent  at  q  with  searching  state 

u\  (k  +  1)  =  u\  ( k )  +  b(k)At 

w\  (k  +  1)  =  w\  ( k )  +  c(k)At  (6) 

where  b(k)  is  a  random  variable  having  uniform  distribution 
with  the  support  assigned  by  the  user,  c(k)  is  a  random  variable 
having  normal  distribution  with  a  zero  mean,  and  variance  speci¬ 
fied  by  the  user,  and  At  is  the  discrete  time  period.  When  another 
objective  such  as  travel  distance  is  included,  collaboration  be¬ 
tween  agents  may  be  used  to  improve  the  search  efficiency  (e.g., 
see  Franchi  et  al.  [8]).  Alternatively,  task  allocation  such  as  that 
developed  by  Viguria  and  Howard  [24]  may  be  implemented. 

B.  Pursuit 

When  a  point  lying  on  the  boundary  of  the  spill  is  within 
the  ith  agent’s  sensor  FOV,  a  detection  occurs.  The  zth  agent 
changes  to  a  pursuing  state  if  a  detection  occurs  either  in  its 
FOV  or  by  a  member  of  its  local  communication  graph.  The 
location  of  the  detected  point  is  denoted  by  rd .  A  potential  field 


The  negative  gradient  Fp  (r)  of  the  potential  Up  (r)  is  utilized  to 
navigate  the  agent  in  pursuing  state.  In  practice,  we  assume  that 
there  is  a  maximum  velocity  i>max  to  bound  the  movement  of 
each  agent. 

C.  Tracking  the  Spill  Boundary 

Assume  that  the  agent  i  is  in  the  pursuing  state.  When  ||r*  — 
rd  ||  <  d,  where  d  is  a  capture  distance  defined  by  the  user, 
the  agent  changes  into  the  tracking  state.  It  will  then  move 
along  the  spill  boundary  with  the  interior  of  the  spill  on  its  left. 
The  forward  direction  of  the  agents’  movement  is  as  shown 
in  Fig.  3. 

When  an  agent  moves  away  from  the  forward  direction,  its 
velocity  is  negative.  In  this  case,  the  unicycle  model  in  (2)  can 
still  be  used  by  simply  adding  7r  to  the  orientation.  For  agent 
i.  It  is  defined  as  follows.  Assume  that  at  time  t,  the  zth  agent 
is  tracking  f 1j.  It  will  measure  the  length  of  the  perimeter  and 
positions  of  other  agents  in  the  tracking  state  within  its  FOV. 
Practically,  this  measurement  may  be  performed  by  calibrated 
webcams  (low  cost,  but  low  accuracy),  laser  rangefinders  (higher 
cost  and  high  accuracy),  or  a  system  such  as  that  developed  by 
Pugh  et  al.  [25].  Then,  if  the  closest  neighbor  in  the  forward 
direction,  whose  index  is  represented  by  iV(r, ) ,  is  inside  its  FOV, 
set  It  =  |r, ,  riV(r.  j  ||sn. ;  otherwise,  l,  is  unknown  to  the  agent. 
Assume  that  <9f \j  can  be  represented  by  a  function  s  =  /( r), 
where  s  £  [0,  \\dQj  ||]  indicates  the  length  of  the  perimeter  from 
a  reference  point  to  r  along  the  forward  direction.  Write  /( r,) 
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as  Si  and  s(rjv(r;))  as  Sj+i,  then  when  k  is  known  to  the  agent 
i,  it  can  be  computed  as  follows: 


^2+1  ^2;  tf  ^2+1  A  ^2 

__  Si+i  —  Si  +  ||9f2j||,  if  s,+i  <  Si . 

Define  a  virtual  distance  L,  between  agent  i  and  Aftr, )  as 
li  ,  if  li  <  L 

L,  if  k  >  L  or  li  is  unknown 


U  = 


U  = 


(9) 


(10) 


( i  Lj — i 

dsi  1  l  Li  Lj 


(12) 


Ui  = 


,  ,  1  ii-i 
1  1  A  L? 


fci 


+(uz+i  -  2ui  +fj_i), 

1  LU 


L2 

L 


+  {Vi-\  ~Vi), 


kl  (  J--J2  )+(u*  +  l  -  vi )> 


if  Li  Z,  Li— i  X/ 

if  L[  —  Z,  Li—\  <  Z 

if  L{  <  Z,  Li-i  —  Z 

if  Li  —  Z,  Li—\  —  Z. 

(13) 


Z;  = 


k, 


if  k  <  L 
if  k  >  L 


(14) 


where  L  is  the  influence  distance  between  agent  7  and  agent 
JV(rj),  and  is  set  as  L  =  rs  in  our  paper. 

Assuming  that  L0  =  Ln. ,  i.e.,  the  last  agent  wraps  to  the  first 
agent,  a  potential  function  [26],  [27]  is  proposed  for  agents  i, 
which  is  given  by 

y8(L!,LJ_1)  =  fci(2Zn(LI)  +  %i)  (11) 

where  ki  is  a  positive  constant.  When  L,  <  L  and  L,  i  <  L, 
using  (9)-(l  1),  the  derivative  of  Vi  with  respect  to  s,  is 


where  V  has  a  minimum  at  Li  =  L,  _  | .  Including  the  control  of 
velocity  for  the  agents,  the  following  controller  is  proposed  to 
be  used  as  the  linear  acceleration  input  of  the  7th  agent: 


Fig.  4.  Change  in  at  each  time  step,  (a)  Movements  of  agent  and  spill  have 
intersection;  if  As*  >  0,  angular  change  is  A 0i  \  if  As*  <  0,  angular  change 
is  A '0i.  (b)  Movements  of  agent  and  spill  have  no  intersection;  agent  moves 
toward  the  closest  boundary  point  of  the  spill. 

where  1,  is  the  noisy  measurement  by  agent  B.  This  case  will  be 
studied  in  our  future  study. 

D.  Stability  Analysis 

To  investigate  the  system’s  stability,  we  consider  a  time  t  >  T 
in  which  we  assume  that  all  agents  are  in  the  tracking  state,  and 
the  spills  stop  changing  in  size  and  location.  Another  assump¬ 
tion  is  that  the  number  of  agents  is  large  enough  so  that  each 
agent  is  connected  with  its  leading  and  trailing  neighbors  during 
tracking.  Without  loss  of  generality,  we  relabel  the  agents  so  that 
the  first  n\  agents  track  Di,  the  {n\  +  l)th-(ni  +  7i2)th  agents 
track  Sla ,  and  so  on.  Let  1  =  [L  \ ,  To .... ,  L„  7  .  The  collective 
potential  of  the  system  can  be  given  by 


n0H - \-rij 

v(s)  =  v(i)  =  J2(  E  ^ 

j=  1  \*=noH - bnj-i+1 


(15) 


where  s  =  [si, . . . ,  sn]r;  P  is  the  number  of  spills  when  the 
spills  stop  changing;  no  equals  zero;  and  Mi ,  M2,  ■  ■  ■ ,  Mp  are 
the  lengths  of  the  p  spill  boundaries;  L  \  ,L->, . . . ,  Ln  satisfies 

L\  +  L2  +  •  ■  ■  +  Lni  =  Mi 
Ln  1+1  +  Ln  i+2  +  •  •  •  +  L„1  +  n2  =  M2 


From  (12)  and  (13),  it  is  evident  that  when  Li  goes  to  zero, 
Ui  approaches  minus  infinity.  This  guarantees  that  the  7th  agent 
does  not  surpass  the  ( i  +  l)th  agent,  which  implies  that  the  order 
of  the  agents  is  maintained  when  they  are  moving. 

The  angular  velocity  of  the  7th  agent  is  also  controlled  dis¬ 
cretely.  Assume  that  the  change  in  distance  traveled  by  the  7 1  h 
agent  during  At  is  As,.  The  change  in  ft, ,  denoted  as  Aft,,  is 
computed  in  different  situations,  as  shown  in  Fig.  4. 

If  another  kind  of  agent,  denoted  as  B,  which  has  a  much  larger 
measurement  range,  (e.g.,  an  airborne  agent)  is  available  to 
monitor  the  spill,  while  the  multiple  robotic  agents  are  tracking 
the  spill,  its  measurement  can  be  used  as  a  coarse  measure  of 
li .  This  measurement  of  li  is  likely  to  have  a  lot  of  associated 
noise  due  to  the  low  resolution  of  the  global  measurement.  In 
that  case,  (10)  may  be  changed  to 


'n H - \-np-i  +  l  +  Lni  -| - f  np-i+2  +  ’  ’  ’  +  Ln  —  Mp. 

The  collective  dynamics  of  the  system  is  given  by 


v  =  — 


m  m  dVn 

dsi  ’  ds2  ’  "  ’  ’  dsn 


1  T 


-  £(s)t 


(16) 

(17) 

(18) 


where  v  =  [vi , . . . ,  vn]T  and  £  is  the  graph  Laplacian  as  de¬ 
scribed  in  [17]  and  [28]. 

By  using  Lagrange  Multipliers,  (15)  has  an  extremum  point 
at 


L-\  —  Lo  —  •  •  ■  —  L„,  — 


Mi 

ni 


Lni+l  —  Lji,  _j_  2  —  *  Lni- 


JTl\  H - b Tip  _i+2 


M2 

n2 


=  ...  =  Ln  =  Mp 

np 


(19) 
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Fig.  5.  Example  of  the  collective  potential  value  changing  with  L\  and  L2 . 


Although  it  is  analytically  difficult  to  show  that  this  point  is  the 
only  minimum  for  (15),  numerical  simulations  with  MATLAB 
support  this  assumption.  For  example,  when  n  =  3,  M  =  24, 
p  =  1,  and  k  =  1,  the  value  of  V  is  shown  in  Fig.  5,  where 
the  minimum  of  V  appears  at  L\  =  L2  =  £3  =  8  and  satisfies 
(19).  Other  values  of  n,  M,  and  p  also  satisfy  (19)  as  verified 
by  simulations. 

Let  H (s,  v)  represent  the  energy  of  the  system,  i.e. 


1  n 

H(s,v)  =  V (s)  +  -J2  v2. 


(20) 


Then, 

H{  s,  v)  =  W(s)Tv  — 


dV1  dVn 


dvni  ,  dv2  av„_!  dVn-np+ 1 


dsi  dsi  ’  ’  dsn 


Ln.  _i  1 


dsn 

2  Ln~  2 


v  -  vT£(s)v 
v- vt£(s)v 


1 


l Lni  LI 


£2 ’  ’  £n- 1  £ 


2  T 

1  ±Jn-np  + 1_ 


-  vr£(s)v 
where  £  has  the  form 

£  = 


(21) 


/A 


A 


(22) 


cj 


which  is  a  positive  semidefinite  block  diagonal  matrix  with  rank 
n  —  p,  and  Cj  has  dimension  rij .  In  general,  H  may  be  positive, 
because  the  potential  function  for  agent  i  is  based  on  only  the 
closest  leading  and  trailing  neighbors. 

Therefore,  for  agent  i,  no  information  on  V,  1  and  V, . \  is 

available.  However,  with  bounded  v  and  the  control  in  (13), 
each  Li ,  where  i  =  1, 2, . . . ,  n,  tends  to  be  close  to  each  other, 
and  then,  the  first  set  of  terms  in  curly  brackets  on  the  right  side 
of  (21)  gets  close  to  zero.  Thus,  the  sign  of  H  is  dominated  by 
— vr£(s)v  and  is  negative. 


In  this  case,  H  equals  zero  when  v  is  an  eigenvector  of  C  cor¬ 
responding  to  eigenvalue  zero.  Then,  v  will  be  in  the  space  with 
basis  containing  p  vectors,  denoted  by  b,  =  [c.\  cn } , 

where 


1,  n0  H - 1-  7ij_i  +  1  <  j  <  n0  H - bn* 

0,  otherwise 

(23) 


for  i  =  1, 2, . . .  ,p,  and 


v  —  A.jbi  +  X2b2  +  •  •  •  +  A  bp  (24) 

where  A.;,  i  =  1, 2, . . .  ,p,  is  an  arbitrary  parameter  that  in  this 
case  represents  the  steady  linear  velocity  of  all  agents  on  dil, . 
This  shows  that  the  velocities  of  all  agents  tracking  the  same  spill 
converge  to  the  same  value.  Since  v  is  bounded,  the  convergent 
velocity  is  also  bounded.  When  H  achieves  a  minimum,  V  is 
also  at  its  local  minimum.  From  previous  analysis 


Lni-\ hrij-1  +1  —  ■  ■  '  —  LHij[ - f  rij  j  for  J  —  1)  2,  •  •  • ,  p 

(25) 

is  a  local  minimum  for  V .  Hence,  the  system  achieves  the  stable 
state  with  equal  distance  between  adjacent  agents  and  equal 
velocity  for  all  agents  tracking  the  same  spill. 

Consider  a  case  that  after  all  agents  change  to  tracking  state, 
spill  Hi  does  not  have  enough  agents  tracking  along  its  bound¬ 
ary,  i.e.,  m  <  ||9Hi||/L.  We  claim  that  all  agents  tracking  Hi 
converge  to  the  same  velocity  and  the  distance  between  each 
pair  of  adjacent  agents  converges  to  a  value  no  less  than  L.  The 
energy  of  the  agents  tracking  Hi  is  presented  as 

.  m 

H1  (si ,  vi )  =  V(si )  +  -  vf .  (26) 

Z  i=  1 


Similarly, 


H(  si,  vi) 


2  Ai-i  1 

a’'“’ 


2 

X  — - 

Ln  \  —  1 


L 

L 


"1 

2 

n  1 


-2 


-1 


1 

L\ 


Vi  -  vf  £(si)vi 


(27) 


where  sx  =  [si, . . . ,  sni]  and  Vi  =  [i>i, . . . ,  vni\.  Without  loss 
of  generality,  we  assume  that  at  time  t,  C\  is  a  block  diagonal 
matrix  containing  two  blocks  C\\  and  £12.  In  view  of  the  dis¬ 
cussion  before,  if  no  connection  between  £n  and  £12  holds, 
the  velocity  of  agents  in  C\\  converges  to  v1  and  the  velocity 
of  agents  in  £12  converges  to  v2.  If  v 1  ^  v2,  the  two  connected 
graphs  will  connect  with  each  other  some  time  since  they  are 
tracking  a  perimeter.  Then,  (27)  is  likely  to  be  negative  and 
cause  the  energy  of  the  system  to  decrease  again  until  either  the 
graph  splits  into  several  connected  graphs  or  the  velocity  of  all 
agents  converges  to  the  same  value.  Since  different  velocities 
of  agents  in  various  connected  graphs  cause  a  decrease  of  the 
system  energy,  the  system  energy  achieves  a  local  minimum 
when  the  agents  within  all  connected  graphs  tracking  Hi  have 
the  same  velocity.  In  this  case,  the  first  agent  at  the  forward 
direction  of  each  connected  graph  will  have  a  leading  virtual 
distance  equal  to  L  (i.e.,  the  distance  between  this  agent  and 
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its  leading  adjacent  agent  is  larger  than  L),  which  causes  the 
distance  between  adjacent  agents  within  its  graph  to  be  L  to 
have  a  minimal  energy  as  well.  Therefore,  the  distance  between 
each  pair  of  adjacent  agents  tracking  f2i  is  not  less  than  L. 

IV.  Simulation  and  Experiment  Results 

In  this  section,  simulations  and  experiments  are  described  to 
test  the  performance  of  the  proposed  method  based  on  different 
conditions.  The  influence  of  spill  locations,  size,  and  numbers 
as  well  as  the  agent  group  size  is  studied. 

A.  Simulation  Results 

In  this  section,  we  build  a  simulation  to  see  how  the  agent 
group  adapts  to  the  change  of  the  spill  number  as  well  as  its 
location  and  size.  In  this  environment,  we  assume  that  there 
are  two  spills  at  the  beginning.  As  time  goes  on,  the  two  spills 
merge  into  one  spill,  and  then,  again  split  into  two  spills.  The 
number  of  agents  is  varied  from  6  to  46,  W  is  a  square  arena 
with  sides  of  length  40  m,  r$  =  5  m,  r> f  =  8  m,  Umax  =  8  m/s, 
and  At  =  0.01  s.  All  simulation  results  are  averaged  over  30 
runs.  The  result  of  one  run  with  26  agents  is  shown  in  Fig.  6. 

From  the  time  snapshots  of  the  simulation  process,  we  can 
see  that  the  agents  successfully  search,  pursue,  and  track  the 
spills  no  matter  if  they  are  merged  or  split  into  pieces.  After  the 
spills  stop  changing,  the  velocities  of  agents  tracking  the  same 
spill  converge  to  the  same  value  as  well  as  maintaining  constant 
distance  between  adjacent  agents.  Readers  are  referred  to  [29] 
for  the  results  of  examples  with  other  spills. 

Fig.  7  shows  the  average  time  required  for  all  agents  to  reach 
tracking  state  as  a  function  of  the  number  of  agents.  Five  separate 
simulations,  using,  respectively,  6,  16,  26,  36,  and  46  agents, 
were  run  with  all  other  parameters  set  to  the  same  values  used 
in  the  simulation,  as  shown  in  Fig.  6.  The  results  show  that  as  the 
number  of  agents  increases,  the  group  requires  less  time  for  all 
agents  to  change  the  tracking  state.  This  is  because  with  more 
agents,  it  is  more  likely  that  several  agents  will  be  connected  in 
a  local  graph  and  that  the  possibility  for  an  agent  in  the  group 
to  detect  a  spill  is  increased.  This  decreases  the  time  that  the 
agent  group  needs  to  explore  the  environment  to  detect  spills. 
For  more  than  26  agents,  the  improvement  in  time  decreases 
as  the  number  of  agents  increases.  The  reason  is  that  when  the 
number  of  agents  is  large,  most  agents  are  connected  in  the  local 
graph  and  immediately  achieve  the  pursuing  state  as  soon  as  one 
agent  in  the  local  graph  detects  a  spill;  adding  more  agents  to 
the  group  does  not  decrease  the  time  to  detection  as  much  as  it 
does  when  the  number  of  agents  is  small. 

To  demonstrate  how  the  size  of  the  sensor  FOV  can  affect  the 
process  time  for  all  agents  to  change  the  tracking  state,  another 
simulation  is  conducted  utilizing  1 6  agents,  with  different  r$  val¬ 
ues  of  1,  3,  5,  10,  15,  20,  and  30.  The  results  are  shown  in  Fig.  8. 
The  results  show  that  as  the  sensor  FOV  increases,  the  needed 
time  decreases.  The  rate  of  decrease  tends  to  be  small  when  r$ 
is  larger  than  15.  The  reason  is  that  as  r$  increases,  the  possi¬ 
bility  for  the  robotic  sensor  to  detect  a  spill  increases;  however, 
when  rs  is  large  enough  compared  to  the  size  of  the  workspace, 


(g) 


(h) 


Fig.  6.  Simulation  result  with  changing  number  of  spills,  (a)  Snapshot  at  time 
step  one.  (b)  Snapshot  at  time  step  52.  (c)  Snapshot  at  time  step  400.  (d)  Snapshot 
at  time  step  682.  (e)  Snapshot  at  time  step  690.  (f)  Snapshot  at  time  step  5000. 
(g)  Distance  between  each  agent  and  the  closest  agent  ahead  of  it.  (h)  Velocity 
of  each  agent. 
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Fig.  7.  Average  time  needed  for  all  the  agents  to  change  the  tracking  state  for 
different  numbers  of  agents.  The  results  are  averaged  over  30  runs. 
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Fig.  8.  Average  time  needed  for  all  of  the  agents  to  change  the  tracking  state 
under  different  sizes  of  sensor  FOV.  The  results  are  averaged  over  30  runs. 

the  agent  could  easily  find  the  spill  in  a  short  time,  and  further 
increase  of  rs  will  not  shorten  the  search  time  significantly. 

B.  Experiment  Results 

Several  experiments  using  real  robots  in  a  mixed  simulation 
setting  have  been  conducted.  The  agents  in  these  experiments 
are  iRobot  Create  robots.  Dell  Mini  10  netbook  computers  are 
mounted  to  each  robot,  sending  drive  commands  to  and  receiving 
sensor  data  from  the  Create  base. 

Each  robot  is  equipped  with  reflective  markers  in  a  unique 
pattern  for  identification  to  allow  tracking  via  NaturalPoint  Op- 
tiTrack  V100R2  cameras  and  customized  Tracking  Tools  soft¬ 
ware  on  a  centralized  computer  (see  [30]  and  [31]  for  alter¬ 
natives)  .  The  tracking  data,  akin  to  “indoor  GPS,”  are  avail¬ 
able  to  each  individual  robot  via  wireless  ethernet  (WiFi)  from 
this  central  switchboard.  The  laboratory  space  is  approximately 
3  m  x  6  m,  which  is  well  within  WiFi  range;  use  of  the  switch¬ 
board  computer  enables  the  simulation  of  limited  communica¬ 
tion  range  by  allowing  (respectively,  blocking)  the  communi¬ 
cation  between  robots  that  are  inside  (respectively,  outside)  a 
specified  range. 

The  spills  are  simulated  by  covering  an  area  of  the  floor  with 
a  material  of  contrasting  color.  The  spill  sensor  on  each  robot 
is  a  set  of  four  downward  looking  IR  sensors  mounted  on  the 
front  bumper.  The  robot  cannot  detect  the  spill  until  it  is  already 
sufficiently  close  to  enter  tracking  state.  If  no  other  spill  sensor 
is  equipped,  the  time  spent  in  pursuit  state  in  these  experiments 
will  be  only  one  time  step  if  the  robot  has  directly  detected  the 
spill.  The  only  case  in  which  a  robot  remains  in  pursuit  state  for 
extended  time  is  when  spill  detection  occurs  by  another  member 
of  the  local  communication  graph. 

The  downlooking  IR  sensors  of  the  Create  robot  are  thus  used 
as  the  primary  sensors  for  actually  tracking  the  spill;  when  the 
spill  is  detected  in  these  sensors,  the  robot  switches  into  track 
state. 1 

'For  these  experiments,  the  robots  do  not  make  use  of  on-board  cameras 
although  this  capability  has  since  been  implemented  for  these  robots.  The 
robots  are  now  equipped  with  Logitech  webcams  and  Hokuyo  scanning  laser 
rangefinders,  and  relatively  localize  to  their  neighbors  using  methods  described 
in  [31]  and  [32], 


(a)  (b) 


Fig.  9.  (a)  Experimental  configuration  of  Create  robot  with  Dell  MinilO  net- 

book,  Logitech  Quickcam  Pro  9000,  and  Hokuyo  URG-04LX-UG01  scanning 
laser  rangefinder,  (b)  Perspective  from  camera  no.  5  within  the  system. 

A  sample  active-IR  image  of  multiple  robots  captured  by 
camera  no.  5  is  shown  in  Fig.  9(b).  The  markers  are  clearly  vis¬ 
ible  in  this  full-frame  image,  as  are  the  spill  and  robot  shapes. 
The  robots  in  this  experiment  are  constrained  to  a  plane;  thus, 
the  out-of-plane  distance  measurement  and  the  roll  and  pitch 
orientation  measurements  are  unnecessary  (and  should  be  un¬ 
changing).  For  more  details,  the  reader  is  referred  to  [22], 

For  each  experiment,  the  robots  were  distributed  about  the 
laboratory  space  away  from  the  spill,  with  no  particular  arrange¬ 
ment  or  pattern.  Results  for  two  experiments  are  presented  here 
although  several  different  configurations  of  spills  have  been  con¬ 
sidered  in  this  paper,  including  convex  and  nonconvex  shapes, 
shapes  with  different  areas,  and  contiguous  or  noncontiguous  ar¬ 
eas.  Both  experiments  presented  here  utilized  five  robots  in  the 
configuration  discussed;  several  other  experiments  were  con¬ 
ducted  with  three  or  four  robots,  as  well. 

For  the  experiments  reported  here,  the  communication  range 
is  tm  =  2  m,  the  avoidance  influential  range  is  =  1  m,  and 
track-state  influence  range  is  rs  =  L  =  1  m. 

In  the  preliminary  experiments,  it  was  observed  that  in  track 
state,  the  robots  occasionally  converged  to  a  commanded  speed 
that  is  too  close  to  zero  to  be  realized.  Thus,  in  these  experiments, 
a  minimum  velocity  14m,  =  50  mm/s  was  imposed  to  prevent 
the  robots  from  coming  to  a  stop  while  attempting  to  reach 
equilibrium. 

The  first  experiment  presented  utilizes  a  stationary  rectan¬ 
gular  spill  with  a  perimeter  length  of  4.7  m.  With  five  robots, 
and  rg  =  L  =  1  m,  this  should  yield  equal  spacing  along  the 
perimeter  once  all  robots  achieve  the  track  state.  The  results  of 
this  experiment  are  shown  in  Fig.  10. 

From  the  figure,  it  can  be  seen  that  the  first  four  robots  quickly 
transition  to  track  state.  However,  as  the  robots  are  not  points  but 
rather  extended  bodies,  some  additional  time  is  required  for  the 
fifth  robot  to  reach  the  perimeter.  This  is  due  to  the  avoidance 
hierarchy  that  forces  a  nontracking  robot  to  yield  to  one  that 
is  tracking.  Thus,  a  sufficient  gap  must  be  present  before  the 
robot  can  merge  onto  the  perimeter.  Once  all  five  robots  are  on 
the  perimeter,  the  speeds  do  indeed  converge,  and  the  spacing 
reaches  an  equilibrium. 

The  second  experiment  presented  utilizes  the  same  stationary 
spill  of  the  first  experiment,  but  with  a  second  smaller  spill 
with  a  perimeter  length  of  2.5  m  that  moves  over  time.  The 
results  of  this  experiment  are  shown  in  Fig.  1 1 .  Initially,  the  two 
spill  components  are  overlapping;  thus,  the  initial  evolution  is 
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Fig.  10.  Experimental  result  for  five  robots  with  a  stationary  large  spill,  (a)- 
(e)  Snapshots  of  selected  time  points  indicating  the  location  of  the  spill  and  the 
agents,  (f)  Linear  speeds  of  all  agents. 


similar  to  that  of  the  first  experiment.  Beginning  at  time  step 
500,  the  smaller  spill  is  gradually  moved  away  from  the  larger 
spill,  increasing  the  perimeter  length  and  eventually  separating 
into  a  second  small  spill;  snapshots  of  this  transition  are  seen  in 
Fig.  1  l(d)-(e).  During  the  growth  phase  of  the  spill,  the  robots 
dynamically  increase  their  spacing  to  accommodate  the  longer 
perimeter.  Once  the  smaller  spill  separates,  one  robot  continues 
to  track  it  with  no  change  in  its  velocity;  on  the  larger  spill,  the 
remaining  four  adjust  their  relative  spacing  to  accommodate  the 
shorter  perimeter  and  lost  robot. 

At  time  step  700,  the  smaller  spill  rejoins  the  larger  spill. 
This  increased  perimeter  length,  as  illustrated  in  Fig.  11(f),  is 
detected  by  the  next-passing  robot  around  time  step  730.  This 
causes  a  large  increase  in  its  speed  because  its  leading  robot 
is  suddenly  much  further  away  along  the  perimeter.  As  seen 
in  Fig.  11(h),  the  speeds  of  most  of  the  robots  briefly  increase 
(around  k  =  700)  when  the  smaller  spill  rejoins  the  larger  spill. 
The  reintroduction  of  the  fifth  robot  into  the  network  causes  a 
disruption  to  the  equilibrium,  which  quickly  settles  out.  Finally, 
the  small  spill  is  completely  engulfed  by  the  larger  spill,  again 
resulting  the  convergence  as  seen  in  the  first  experiment. 

Videos  of  the  presented  experiments  and  simulations,  as  well 
as  other  experiments  not  included  here,  can  be  found  at  the  Duke 
RAMA  Lab  YouTube  channel  [33], 

V.  Conclusion 

In  this  paper,  a  hybrid  system  of  finite  states  with  feedback 
control  is  advanced  for  a  group  of  agents  with  a  limited  sensor 
FOV  and  a  limited  communication  range  to  search  for,  detect, 
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Fig.  1 1 .  Experimental  result  for  five  robots  with  a  large  spill  that  grows,  splits 
into  two  spills,  and  then  collapses  to  one  large  spill,  (a)— (g)  Snapshots  of  selected 
time  points  indicating  the  location  of  the  spill  and  the  agents,  (h)  Linear  speeds 
of  all  agents. 


and  track  a  number  of  spills  in  an  unknown  environment.  A 
hierarchical  potential  field  is  designed  for  agents  in  different 
states  to  control  their  movement  that  provides  a  simpler  control 
law  for  collision  avoidance.  Simulation  and  experiment  results 
demonstrate  that  the  agents  can  successfully  detect  and  track 
spills  whose  location,  shape,  size,  and  number  may  change  over 
time.  Significantly,  this  adaptation  occurs  automatically  with 
no  need  to  specify  a  predetermined  distance  between  adjacent 
agents.  Further  focus  on  search  efficiency  and  optimal  conver¬ 
gence  rates  is  ongoing.  Additional  enhancement  of  the  robots’ 
sensing  ability  continues. 
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ABSTRACT 

A  strategy  is  described  that  utilizes  a  novel  application  of  a 
potential-force  function  that  includes  the  tuning  of  coefficients 
to  control  mobile  robots  orchestrated  as  a  distributed  multi¬ 
agent  system.  Control  system  parameters  are  manipulated 
methodically  via  simulation  and  hardware  experimentation  to 
gain  a  better  understanding  of  their  impact  upon  mission 
performance  of  the  multi-agent  system  as  applied  to  a  pre¬ 
determined  task  of  area  exploration  and  mapping.  Also 
included  are  descriptions  of  experiment  infrastructure 
components  that  afford  convenient  solutions  to  research 
challenges.  These  consist  of  a  surrogate  localization  (position 
and  orientation)  function  utilizing  a  novel  MATLAB 
executable  (MEX)  function  and  a  user  datagram  protocol 
(UDP)-based  communications  protocol  that  facilitates 
communication  among  network-based  control  computers. 

Keywords:  Control,  Multi-agent  System,  Swanning  Robots, 
Potential  Function. 


1.  INTRODUCTION 

From  the  inception  of  the  development  of  robotic  technologies, 
researchers  have  generally  concluded  that,  with  many  if  not 
most  applications,  there  are  significant  advantages  to  be  gained 
through  the  coordination  and  orchestration  of  multiple  robotic 
entities  [1-2].  This  is  not  difficult  at  all  to  imagine  by  simply 
observing  Darwinian  development  in  biological  systems 
whereby  two  or  four  legs  proved  superior  to  one  for 
locomotion;  or  separate  arms  each  with  a  hand,  fingers,  and  an 
opposing  thumb  proved  superior  for  grasping  and  cooperatively 
manipulating  objects.  Furthermore,  it  similarly  is 
understandable  that  a  small  group  of  separate  robotic  entities 
(e.g.,  a  small  number  of  cooperating  or  coordinating  unmanned 
ground  vehicles)  could  also  exhibit  significantly  desirable 
characteristics  in  certain  circumstances.  Five  main  motivations 
for  using  multi-robot  systems  include:  task  complexity;  task 
distribution;  resource  distribution;  parallel  processing;  and 
robustness  through  redundancy  [3],  Some  researchers  have 
explored  the  advantages  gained  for  specific  tasks  such  as 
exploration  where  a  specified  task  can  be  accomplished  by  a 
number  of  agents  surveying  the  maneuver  space  versus  a  single 
robot  tasked  with  exploring  the  entirety  by  itself  [4],  Indeed, 
these  become  valuable  against  overarching  task 
accomplishment;  however,  not  without  burden.  Multi-robot 
systems  add  communications  overhead  and  workload  with  the 
requirement  for  inter- robot  cooperation  and  knowledge.  They 


also  add  complexity  to  path  planning  and  collision  avoidance 
schemes  since  each  robot’s  cooperating  peer  also  exists  as  an 
obstacle  to  be  avoided  [5], 

Research  over  the  past  years  has  centered  on  organizing  several 
to  tens  of  mobile  robots  (or  agents)  into  coordinated  multi¬ 
agent  systems  (MAS)  [6-8],  Further,  the  quantity  of  these 
systems  may  reach  into  the  hundreds  or  thousands  that 
comprise  a  robotic  swarm.  In  contrast  to  single  mobile  robots 
with  their  broad,  comprehensive  capabilities  (vision,  FIDAR, 
differential  global  positioning,  ultrasonic  proximity  sensing, 
etc.),  the  agents  comprising  a  MAS  tend  to  have  somewhat 
lesser  organic  capability  due  to  a  desired  affordability  or 
individual  physical  size.  An  example  of  one  such  capability  is 
localization  that  could  provide  accurate  agent  position  and 
orientation  in  a  global  reference  akin  to  the  Global  Positioning 
System  (GPS).  While  GPS  receivers  have  become  small  and 
inexpensive,  they  cannot  be  used  indoors  where  MAS 
experimentation  and  utilization  typically  exist.  Research  in  the 
field  has  centered  on  understanding  the  root  mechanisms  and 
control  features  that  can  define  and  drive  the  behavior  and 
capability  of  the  MAS  architectures  in  terms  such  as  task 
allocation,  flocking,  or  foraging.  The  study  of  analogous 
biological  systems  has  also  provided  valuable  insight. 
However,  to  date,  there  has  been  little  work  done  with  respect 
to  understanding  the  capability  of  the  MAS  as  a  function  of  its 
composition  given  the  option  of  pre-mission  selection  of  the 
performance  and  capability  of  individual  agents,  as  well  as 
opportunity  to  include  extra-system  infonnation  sources  or 
sensors. 

Various  attempts  at  defining  MAS  composition  [9-1 1]  explored 
heterogeneous  versus  homogeneous  systems,  highlighting 
concepts  such  as  hierarchic  social  entropy,  diversity,  and 
cooperative  localization.  Heterogeneous  MAS  have  been  used 
to  explore  unknown  environments  for  the  purpose  of  mapping 
and  exploration  [12],  Here,  each  MAS  featured  varying  types 
of  single-mode  sensor  payloads  including,  for  example,  an 
infrared  camera  or  sonar,  coordinated  in  an  occupancy  grid 
Bayesian  mapping  algorithm.  But  again,  the  composition  of 
the  architecture  was  held  constant. 

An  example  of  a  MAS  configuration  is  illustrated  in  Figure  1. 
Here,  four  differential-wheeled  robotic  agents  featuring 
distributed  control  architectures  orchestrate  to  conduct  a 
mapping  of  an  initially-unknown  discrete  arena  to  localize  and 
identify  targets  and  obstacles.  Each  agent  has  imaging  and 
proximity  sensing.  They  pass  information  to  each  other  via  a 
consolidated  state  matrix  that  receives  data  from  each  agent 


and  simultaneously  provides  state  updates  to  each  agent. 


Figure  1 :  Representative  Multi-agent  System  Arena. 

The  underlying  control  architecture  plays  an  absolutely  critical 
role  in  the  efficiencies  observed  in  the  multi-agent  versus 
single-robot  architectures  [13-16],  Clearly,  beyond  the 
inherent  capability  of  the  electro-mechanical  configuration  as 
well  as  the  sensor  payloads,  the  single  greatest  contributor  to 
mission  efficiencies  and  task  capabilities  will  exist  in  the 
capability,  adaptability,  and  comprehensiveness  of  the  control 
system.  A  wide  variety  of  algorithms  and  schemes  have  been 
explored  including  feedback  laws,  leader  follower,  task- 
oriented  mission  planning,  task  reallocation  and  reordering,  and 
deliberative  and  reactive  behavior  modeling.  Beyond  these, 
potential-function-based  robot  control  strategies  exhibit 
significant  promise  to  control  payload-agile  MAS. 

This  paper  is  organized  as  follows:  Section  2  presents  a 
potential-function  based  robot  control  strategy  that  uses 
parametric  tuning  factors  to  increase  the  efficiency  with  which 
the  control  strategy  can  accomplish  an  exploration  task  with  no 
a  priori  information  of  the  search  space.  Section  3  describes 
results  from  the  research’s  simulation  phase.  Section  4 
describes  the  creation  of  a  novel  surrogate  global  positioning 
system  based  on  a  task-specific  MATLAB  MEX  function 
sampling  an  optical  motion  capture  system  broadcasting  over  a 
network  socket.  Additionally,  this  section  summarizes  a  UDP- 
based  data  transmission  architecture  that  enabled  inter-agent 
sharing  of  realized  occupancy  map  elements.  Sections  5  offer 
the  results  from  the  research’s  experimentation  phase  whereby 
increasing  quantities  of  agents  with  various  capabilities  were 
placed  in  the  arena  and  evaluated  for  task  accomplishment 
metrics  such  as  time  to  complete.  The  last  sections  summarize 
the  research  results  and  conclusions. 


2.  POTENTIAL-FUNCTION-BASED  ROBOT  CONTROL 
STRATEGY 

Control  via  potential-function  theory  has  proven  to  be  a 
valuable  basis  upon  which  more  complex  control  feedback 
influence  can  be  implemented.  In  particular,  potential-function 
theory  affords  a  comprehensive  and  simple  method  to  add 
factors  that  can  be  scaled  either  as  independent  input  functions 
or  those  dependent  upon  selected  system  features.  The  basis  of 
the  potential  function  is  to  represent  system  control  influences 
as  independent  or  dependent  forces  [17]  akin  to  an  electrical 
potential  field.  Each  force  is  calculated  based  on  physical 
parameters  (such  as  real  distances)  or  on  representative  values 
(such  as  arbitrary  repulsion  to  facilitate  obstacle  avoidance  or 


attraction  to  a  desired  target  location).  In  conjunction  with  the 
potential  function,  it  is  convenient  to  represent  the  operational 
space  as  an  occupancy  map,  i.e.,  a  two-  or  three-dimensional 
grid  that  sub-divides  the  space  into  identifiable  cells.  Each  cell 
would  have  characteristics  such  as  a  potential  related  to  its 
occupancy  status,  or  status  as  a  target  destination.  Since  the 
potential  function  depends  upon  proximity,  the  relative  location 
of  the  cells  would  also  be  cataloged. 


The  goal  of  the  control  function  for  the  MAS  is  to  calculate  the 
agent-unique  potential  force  witnessed  by  each  of  the  agents 
comprising  the  MAS.  Force  defines  the  resultant  heading  and 
speed  that  the  individual  agent  would  follow.  The  potential 
force  existent  on  the  ith  agent  is  given  by: 
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where, 


Qi  =  inherent  charge  of  the  ith  agent, 

Qj  =  inherent  charge  of  the  j“  agent, 
qj  =  inherent  charge  of  the  j,h  cell, 
radj  =  radius  vector  from  the  ith  agent  to  the  jth 
agent  or  jth  cell, 

Nc  =  number  of  cells  in  the  occupancy  map, 

Na~  total  number  of  agents, 

P  =  calculated  force  contribution  from  specified 

payloads,  and 

lix  =  tuning  coefficients. 

Then,  the  resultant  heading  angle  (0;)  of  the  ith  agent  is  given 
by: 


0;  =  tan  1 


(2) 


The  calculation  of  P  will  vary  with  payload  type  as  the  value 
and  weighting  of  each  payload  type  can  and  should  have 
varying  degrees  of  influence  on  the  resultant  potential  force 
calculation. 

Tuning  coefficients  (nx)  provide  a  mechanism  whereby  the 
relative  contribution  of  each  of  the  potential  forces  can  be 
directly  tuned  either  as  an  independent  variable,  or  dynamically 
tuned  based  on  a  feedback  mechanism.  This  feature  is 
particularly  important  in  multi-agent  systems  due  to  scalability. 
Since  the  potential  forces  contributed  by  elements  within  the 
system  are  cumulative,  it  becomes  very  important  to  gain  an 
awareness  of  the  proportional  contributions  and  to 
subsequently  be  able  to  control  them  proactively.  To  illustrate, 
assume  an  occupancy  grid  of  120  by  240  cells  or  a  total  of 
28,800  total  cells.  Each  cell  might  offer  a  contributing  charge 
value  from  0  to  1  meaning  a  span  of  28,800  charge  units.  If 
other  contributors  merely  summed  to  a  maximum  of  1,000 
charge  units,  the  cells’  contribution  would  easily  overwhelm 
any  potential  contribution  calculation. 


The  occupancy  map  cell  characteristic  vector  represents  the 
probability  that  the  cell  is  occupied.  A  value  of  1.0  indicates 
that  there  is  a  100  percent  chance  that  the  cell  is  occupied.  A 
value  of  0.0  indicates  that  there  is  a  100  percent  chance  the  cell 
is  unoccupied.  A  value  of  0.5  would  indicate  an  equal 
probability  that  the  cell  is  either  occupied  or  unoccupied. 
Separate  variables  that  provide  valuable  influence  in  the  force 
calculation  are  the  number  of  times  that  a  cell  has  been 
sampled  by  a  sensor,  referred  herein  as  the  “cell  visit  count,” 
and  the  number  of  times  that  a  fraction  of  the  entire  occupancy 
map  has  been  visited.  The  usefulness  here  is  to  accommodate 
the  dynamic  nature  of  agents  moving  in  the  arena  as  well  as  the 
potential  for  moving  targets  in  the  field  of  regard.  As  a  result, 
the  charge  associated  with  each  cell  is  given  by: 

n  {PoRmax  for  0<V<n  /qx 

'll  1  0  for  V>n 


where, 

P0  =  cell  occupancy  probability, 
qmax  ~  maximum  possible  cell  charge, 

V  =  cell  visit  count  defined  as  the  number  of 
visits  to  the  ilh  cell,  and 

n  =  number  of  times  that  a  percentage  of  the 
cells  have  been  visited. 


3.  INITIAL  OBSERVATIONS  THROUGH  SIMULATION 

Initially,  the  Cyberbotics’  Webots  simulation  package  was  used 
to  represent  a  120  by  240  centimeter  arena  (Figure  1)  occupied 
by  e-puck  differential-wheeled  robots.  The  arena  is  sub¬ 
divided  into  a  representative  occupancy  mapping  grid 
consisting  of  1  cm  by  1  cm  cells  each  having  an  associated 
vector  representing  dynamic  characteristics  such  as  the 
probability  of  occupancy  or  potential  charge.  The  MATLAB 
scripting  language  was  utilized  to  provide  high-level  potential- 
function  based  path  planning  as  well  as  map  databasing. 

A  comprehensive  set  of  state  and  evaluation  metrics  are 
collected  at  every  control  time  step  that  provide  an  opportunity 
to  observe  the  progression  of  the  exploration  task.  Agent 
trajectories  (Figure  2)  provide  insight  to  the  energy  consumed 
by  each  individual  agent  in  terms  of  the  distance  travelled. 
Trajectories  also  illustrate  the  qualitative  uniformity  and 
dispersion  with  which  the  team  of  agents  uniquely  maneuver  in 
and  canvas  the  arena. 


Figure  2:  Agent  Trajectories. 


The  occupancy  map  data  vector  (Figure  3)  at  a  specific  time 
step  is  displayed  to  show  the  occupancy  probability,  cell 


charge,  and  cell  visit  count.  Observed  individually  and 
sequentially  over  time,  these  vectors  provide  insight  to  the 
progression  of  understanding  of  the  heretofore  unknown  arena 
and  resolution  with  which  obstacles  and  targets  appear.  One 
can  also  conclude  trends  such  as  the  extent  of  exploration  and 
behaviors  proximate  to  obstacles. 
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Figure  3:  Occupancy  Map  Data  Vectors. 

Finally,  mean  characteristic  values  graphed  over  time  (Figure 
4)  quantify  the  performance  of  the  multi-agent  system  and 
illustrate  its  ability  to  progress  toward  threshold  goals.  The 
average  occupancy  map  value  shows  the  time  progression  of 
the  average  across  each  cell’s  probability  of  occupancy.  Since 
there  is  no  a  priori  knowledge  afforded,  the  initial  value  is  0.5 
meaning  all  cells  have  equal  probability  of  being  occupied  or 
unoccupied.  Over  time,  as  the  agents  explore  the  arena  and 
determine  free  and  occupied  space,  the  curve  progresses  as 
expected  decreasing  over  time  with  the  free  space  determined 
and  step  increasing  periodically  with  the  sensor-enabled 
detection  of  obstacles,  i.e.,  occupied  space.  The  curve  is  non¬ 
linear  due  to  the  proximity  of  unexplored  space  diminishing 
over  time  resulting  in  the  necessity  to  travel  longer  distances 
(and  consuming  more  time)  to  sample  those  unexplored  spaces. 
The  average  cell  charge  plots  the  average  of  all  the  cells’ 
calculated  charge  values  over  time.  This  curve  initially 
decreases  due  to  the  fact  that  as  cells  are  explored,  their  charge 
decreases  so  as  to  have  less  impact  in  subsequent  path  planning 
loops.  However,  in  deference  to  the  dynamic  state  of  the  arena, 
the  average  cell  charge  begins  to  increase  in  correlation  with 
the  map  coverage  count  due  to  the  forcing  relationship  in 
Equation  (3).  The  average  cell  visit  count  tracks  the  average 
number  of  times  that  any  agent  samples  individual  cells  over 
time.  This  curve  has  a  constant  slope  as  expected  knowing  that 
the  agents’  speed  varies  very  little  perhaps  only  to  slow  for 
short  period  of  times  while  maneuvering  around  obstacles. 
Finally,  the  map  coverage  count  ( n ,  Equation  3)  is  an  evolving 
parameter  that  tracks  the  number  of  times  that  a  pre-defmed 
percentage  of  the  arena  has  been  sampled  at  least  the  count’s 
number  of  times.  For  example,  the  count  starts  at  zero.  As 
soon  as  75  percent  of  the  arena  has  been  sampled  once,  the 
count  increments  to  1.  As  soon  as  75  percent  of  the  arena  has 
been  sampled  twice,  the  count  increments  to  2.  This  parameter 


is  used  within  the  control  loop  to  influence  the  charge 
calculation  in  order  to  accommodate  the  dynamic  nature  of  the 
state  representation.  The  shape  of  this  curve  provides 
qualitative  clues  as  to  the  efficiency  of  the  search. 


Figure  4:  Characteristic  Value  Histories. 


4.  DISTRIBUTED  MAS  EXPERIMENTATION 

The  transition  from  simulation  to  hardware  experimentation  is 
a  non-trivial  task.  There  are  many  real-world  challenges  in 
localization  and  communications  that  can  be  inaccurately 
represented  as  perfection  in  the  simulation.  In  support  of  on¬ 
going  MAS  research  at  Duke  University,  an  arena  (Figure  5) 
has  been  created  consisting  of  a  120  by  240  centimeter  walled 
stage  hosting  various  numbers  of  e-puck  robots  [18],  obstacles, 
and  targets.  The  e-puck  robot  is  approximately  7  cm  in 
diameter.  It  features  a  Microchip  dsPIC  processor  running  at 
60MHz  or  about  15  MIPS.  Mobility  is  provided  by  two  stepper 
motors  in  a  differential-wheel  drive  configuration.  The  robot 
has  eight  infra-red  sensors  positioned  around  the  robot’s 
perimeter  that  provide  both  proximity  sensing  and  light 
sensing.  There  is  a  640x480  pixel  camera  on-board  that  can 
provide  video  and  imagery  in  either  color  or  black  and  white 
modes.  Bandwidth  limitations  force  pragmatic  collection  to 
52x39  or  640x1  pixel  images  at  up  to  three  frames  per  second. 
There  are  three  omni-directional  microphones  for  sound 
localization,  a  three-axis  accelerometer,  and  on-board  speaker. 
Communications  is  provided  by  Bluetooth.  Each  robot  has  a 
dedicated  desktop  computer  that  remotely  controls  the  robot  via 
Bluetooth  communications.  This  dedicated  computer  allows  for 
substantial  processing  capability,  image  processing  in 
particular,  that  simply  isn't  supportable  via  the  e-puck  limited 
on-board  processing  capability.  The  control  software  is  written 
in  MATLAB  programming  language.  The  e-puck  robots  have 


no  inherent  localization  capability. 


Figure  5:  MAS  Experimental  Arena. 

Two  novel  components  of  this  experimentation  infrastructure 
required  to  fully  replicate  the  simulation  environment  include  a 
surrogate  global  positioning  system  (GPS)  and  data  packet 
transfer  between  agent  controllers.  Both  capabilities, 
developed  under  the  auspices  of  this  research,  are  described  in 
the  following  subparagraphs. 

Surrogate  GPS  via  External  Optical  Motion  Capture  and 
Tracking 

Much  research  effort  thus  far  has  been  focused  on  the 
simultaneous  localization  and  mapping  (SLAM)  problem  [19]. 
SLAM  acknowledges  that  externally-provided  localization 
information  is  not  typically  available,  and  also  acknowledges  a 
scenario  of  exploration  of  heretofore  unknown  space. 
Therefore,  clever  schemes  are  explored  that  use  available 
modalities  (e.g.,  inter-agent  communications)  to  determine 
relative  localization  among  the  agents.  That,  combined  with 
robot-sensor-enabled,  continually-updated  mapping  databases, 
generates  a  comprehensive  representation  of  the  spatial  and 
spectral  environment.  Many  times  though,  inherent  SLAM 
functionality  is  not  germane  to  the  research  problem  being 
pursued.  In  this  case,  an  extra-MAS  localization  capability  is 
most  desirable  to  minimize  the  experiment’s  independent 
variables. 

There  are  several  materiel  possibilities  for  an  extra-agent 
surrogate  GPS  capability.  Fricke  et  al  [20]  utilized  a 
combination  of  a  Cognex  camera  and  Sick  LIDARs  to  calculate 
localization.  The  Vicon  motion  capture  system  [21]  is  a  popular 
solution  that  uses  connected  high-speed  digital  cameras 
viewing  the  MAS  experiment  workspace  with  redundant 
coverage  to  provide  localization  of  tracked  objects. 

The  Natural  Point  OptiTrack™  system  [22]  utilizing  its 
'Tracking  Tools'  (TT)  software  application  (Figure  6)  provides 
a  similar  capability.  The  OptiTrack™  cameras  are  installed  and 
calibrated  for  the  desired  experiment  workspace.  Each  robot  is 
given  a  unique  spatial  configuration  of  reflective  markers 
(Figure  7)  that  enables  individual  robot  identification  during 
the  tracking  process.  A  critical  functionality  important  to 
supported  experimentation  that  TT  offers  is  the  ability  to 
multicast  tracked-object  localization  in  real  time  through  a 
designated  network  port  on  the  computer  hosting  the  Tracking 
Tools  application.  If  connected  to  a  local-area  network,  any 
other  computer  (e.g.,  one  hosting  an  agent  controller)  could 
also  have  near-instantaneous  access  to  the  real  time  streaming 


position  information  of  itself,  and  of  any  other  agent  in  the 
workspace  with  update  rates  approaching  100  Hertz.  This  is 
particularly  desirable  as  this  streaming  broadcast  scheme 
significantly  reduces  the  inter-agent  communications 
requirement.  TT  offers  three  formats  for  broadcast: 

industry- standard  Virtual  Reality  Peripheral  Network  (VRPN) 
and  Trackd®,  and  NaturalPoint’s  own  NatNet  format.  VRPN, 
provided  by  the  University  of  North  Carolina’s  Department  of 
Computer  Science,  offers  several  advantages  including  a 
comprehensive  C/C++  dynamic-linked  library  (DLL)  and 
public  domain  license  [23],  For  illustration,  NaturalPoint 
provides  sample  C++  code  titled  “VRPN-Listener”  [24]  that 
affords  a  simple  capability  for  sampling  to  the  TT  port  and 
output  the  received  localization  information  to  the  screen. 


Figure  6:  Screen  Shot  of  NaturalPoint  Tracking  Tools  in 
Use. 

Control  can  be  coded  in  a  wide  variety  of  software  languages, 
each  offering  unique  benefits  and  shortcomings.  MATLAB 
offers  significant  advantages  in  terms  of  comprehensive  image 
processing  routines  and  robust  matrix  manipulation  routines  to 
support  important  functionality  such  as  volume  mapping.  In 
implementation  with  the  VRPN  construct  though,  a  significant 
challenge  arises  in  that  there  are  no  MATLAB  native 
commands  or  functions  available  to  interpret  the  VRPN 
broadcast  as  the  DLL  is  C/C++  based.  Here  we  have  realized  a 
solution  and  its  experimental  implementation  by  creating  a 
MATLAB-based  MEX-function  [25]  that  allows  the  robots’ 
MATLAB-based  control  algorithms  to  access  and  call  the  C++ 
DLL  native  to  VRPN  and  thereby  provide  a  comprehensive 
surrogate  localization  function. 


Figure  7:  Optical  Tracking  Reflectors  on  E-puck  Robots. 


MEX-fiinctions  have  an  organic  capability  offered  by 
MATLAB  that  allows  a  programmer  to  compile  software 
routines  written  in  C/C++  and  linked  to  C/C++  DLLs  resulting 
in  a  function  callable  from  within  MATLAB  programs  just  as 
one  would  call  a  MATLAB  built-in  function.  The  specific  C++ 
code  used  in  this  research  is  a  task-specific  modification  of  the 
Natural  Point-provided  “VRPN-Listener”  code  written  in  C++. 
The  capability  of  the  created  MEX-function  is  twofold.  First, 
given  identification  of  the  specific  agent  of  interest,  the  MEX- 
function  interrogates  the  VRPN  broadcast  port  and  obtains  the 
most  current  localization  information.  Secondly,  the  MEX- 
function  converts  the  provided  localization  information  into  the 
desired  coordinate  system  and  units.  For  the  cited  experiment, 
this  consisted  of  converting  the  VRPN-provided  orientation  in 
the  three-dimensional  quaternion  rotation  space  in  radians  to 
roll-pitch-yaw  parameters  in  degrees. 

UDP  Communications  for  Data  Sharing 

Multi-agent  systems  can  be  characterized  as  having  either 
distributed  or  centralized  control  with  the  distinction  being  the 
level  to  which  each  individual  agent  in  the  system  governs  its 
own  motion.  Fully  distributed  control  is  characterized  by  a 
complete  absence  of  data  or  control-command  exchange  among 
the  agents  whereby  each  agent  relies  upon  its  own  sensors  to 
understand  the  environment,  and  its  own  processing  capability 
to  determine  controlled  actions.  On  the  other  end  of  the 
spectrum,  centralized  control  is  characterized  by  a  single 
processor  calculating  the  collective  motion  of  the  system  and 
disseminating  commands  to  each  agent  for  the  orchestration 
movement.  As  a  result,  distributed  control  architectures  require 
robust  on-agent  processing  capabilities,  but  minimal 
communications.  In  contrast,  centralized  control  architectures 
are  accomplished  with  a  single  robust  processor  (the  central 
controller)  and  relatively  minimal  agent  processing.  However, 
a  robust  inter-agent  communication  capability  is  required  to 
facilitate  distribution  of  necessary  control  commands,  sensor 
information,  and  other  data  such  as  state  and  localization 
information.  As  described  control  is  a  spectrum,  one  can 
imagine  that  there  are  varying  levels  of  processing  and 
communications  required  among  the  various  control  schemes 
ranging  from  fully  distributed  to  fully  autonomous. 

The  agents  comprising  the  system  in  this  research  are 
constructed  and  defined  as  being  fully  distributed  with  each 
agent  and  its  associated  controlling  computer  fully  responsible 
for  calculating  motion  commands.  However,  there  is  a  slight 
variation  in  that  a  separate  "state"  computer  exists  to  act  as  a 
surrogate  global  positioning  system  (via  the  OptiTrack™ 
Tracking  Tools  system  and  software);  and  to  collect,  maintain, 
update,  and  distribute  state  information  (position  and 
orientation  of  each  agent  as  well  as  occupancy  mapping  of  the 
operating  space).  Because  of  this,  a  comprehensive  inter¬ 
computer  communications  infrastructure  and  protocol  was 
created  for  passing  of  data  among  the  state  computer  and 
controller  computers. 

From  a  hardware  perspective,  the  seven  computers  (one  state 
computer  plus  the  six  controller  computers)  achieve  full  gigabit 
Ethernet  inter-computer  communications  throughput  capability 
by  connection  to  a  single  DLink  DGS-2208  eight-port  gigabit 
Ethernet  switch.  That  switch  in  turn,  is  connected  to  a  Netgear 
Prosafe  FVS336G  gigabit  Ethernet  router.  That  router  provides 
connectivity  to  the  external  internet  service  provider,  but  more 
importantly,  is  set  up  to  statically  assign  internet-protocol 


addresses  to  each  of  the  experiment's  computers.  With  this 
architecture,  each  computer  can  be  uniquely  identified  by  its 
statically-assigned  IP  address  facilitating  unique  computer 
programming. 

The  second  aspect  of  providing  a  robust  infonnation  exchange 
infrastructure  is  to  select  the  underlying  data  protocol  used. 
There  are  several  protocols  available  that  can  be  used  in 
MATLAB  and  C  programming  including  notably  transmission 
control  protocol/internet  protocol  (TCP/IP)  and  user  datagram 
protocol  (UDP).  In  fact,  MATLAB's  Instrument  Control 
Toolbox  provides  a  very  comprehensive  set  of  TCP/IP  and 
UDP  command  sets.  Each  offers  its  own  advantages  and 
disadvantages.  For  example,  TCP/IP  offers  a  strong  recovery 
feature  that  consists  of  error-checking  that  will  initiate  the 
resending  of  data  packets  if  they  are  not  accurately  received. 
UDP  offers  greater  throughput  potential  due  to  its  use  of 
minimal  data  fields  for  error-checking  overhead.  However,  it 
can  suffer  some  packet  losses.  Ultimately,  UDP  was  utilized 
based  on  the  construct  of  the  validation  experiment.  In  fact, 
MATLAB's  TCP/IP  protocol  cannot  be  used  to  communicate 
between  disparate  MATLAB  instantiations,  i.e., 
communications  between  MATLAB  running  on  computer  "X" 
and  a  separate  instantiation  of  MATLAB  running  on  computer 
"Y."  The  concern  here,  however,  is  that  while  it  offers  the 
greatest  throughput,  UDP  does  not  offer  any  organic  error 
checking.  This  turned  out  not  to  be  an  issue  in  this  research  as 
error  checking  in  the  authored  code  itself  was  able  to 
accommodate  any  transmission  errors. 

5.  RESULTS 

Simulation 

Digital  simulations  showed  that  the  potential-function  path 
planning  control  architecture  provides  a  robust  and  complete 
methodology  to  explore  and  map  an  initially  unknown 
operating  space.  One  representative  experiment  included  a 
series  of  excursions  in  which  the  time  allowed  per  mission  was 
held  constant.  The  number  of  agents  comprising  the  MAS  was 
varied  from  one  to  four.  These  variant  MAS  were  then 
evaluated  in  terms  of  the  resulting  state  of  knowledge  such  as 
map  coverage  count  or  average  occupancy  probability. 
Increases  in  number  of  agents  utilized  in  sequential 
experiments  consistently  showed  a  non-linear  relationship 
between  number  of  agents  and  perforamance  criteria,  for 
example,  map  coverage  count  (Figure  8).  This  likely  is  due  to 
an  inherent  efficiency  gained  in  terms  of  distance  required  to 
travel  to  unexplored  spaces  by  having  additional  agents  in  the 
arena. 
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Figure  8:  Map  Coverage  Count  versus  Number  of  Agents. 


The  physics  of  potential  energy  functions  dictate  that  the  force 
is  inversely  proportional  to  the  square  of  the  distance.  In  this 
research,  the  energy  physics  were  replicated  as  such.  However, 
Figure  9  shows  that  this  tends  to  greatly  magnify  the  influence 
of  proximate  cells  and  greatly  diminish  distant  cells.  As  a 
result,  calculated  path  planning  was  heavily  weighted  in  a  more 
local  versus  global  sense.  This  clearly  will  influence  the 
efficiency  of  the  MAS'  ability  to  exhaustively  search  the  entire 
arena  acknowledging  that  understanding  the  entirety  of 
unexplored  space  and  accommodating  in  the  path  planning 
scheme  is  likely  more  effective  than  merely  considering  the 
radius  to  unexplored  space.  Nonetheless,  the  shortcoming  of 
this  inefficiency  is  overwhelmed  by  the  potential  function's 
ability  to  accomplish  exhaustive  search.  Ultimately,  each  cell 
will  influence  the  path  planning  calculation  insuring  that  each 
cell  ultimately  is  explored.  An  enhancement  to  consider  is  to 
normalize  cell  charges  to  and  gain  better  responsiveness  over 
the  entirety  of  the  stage  versus  proximate  cells.  Effectively, 
this  means  breaking  the  laws  of  physics  and  making  the  cell 
charge  vary  as  reduced  exponents  or  even  linearly  as  a  function 
of  distance.  Clearly,  this  deviates  from  classic  potential  theory, 
but  might  nonetheless  offer  some  interesting  advantages  in  path 
planning  efficiencies. 


Figure  9:  Total  Potential  Force  Witnessed  by  an  Individual 
Agent  in  the  Arena. 

Experimental  Validation 

The  novel  surrogate-localization  and  inter-agent  data- 
communications  solutions  developed  for  experimentation 
infrastructure  proved  to  be  robust  and  highly  conducive  to 
support  control  algorithm  cycle  rates.  Based  on  the  data 
collection  capability  and  mobility  of  the  individual  agents, 
governing  control  schemes  strive  for  cycle  times  less  than  200 
milliseconds  to  insure  exhibition  of  responsive  behaviors.  The 
surrogate  localization  function  created  only  minimal  processing 
time  burden  consuming  only  low-single-digit  milliseconds  and 
with  near-real-time  latencies.  This  proved  extremely 
supportive  of  the  control  scheme  with  the  agents  being  able  to 
accurately  represent  the  locations  of  themselves,  other  agents, 
targets,  and  obstacles  -  certainly  well  within  the  1  cm2 
resolution  of  the  state  map.  A  significant  enabler  to  this  was  an 
ability  to  calibrate  the  accuracy  of  the  OptiTrack™  system 
itself  to  sub-millimeter  levels.  Additionally,  this  function 
proved  a  robust  capability  to  provide  the  variety  of  network- 
connected  controllers  immediate  access  to  their  own  controlled 
agent,  but  also  to  the  peer  agents  operating  in  the  arena,  all  in 
near  real  time,  and  without  having  to  burden  the  inter-agent 
communications  architecture.  The  UDP-based  data- 
communications  function  provided  a  reliable  capability  to  pass 
large  datasets  of  state  information  among  the  networked 


computers.  A  single  state  map  consists  of  28,800  cells  (120  by 
240)  each  represented  by  a  64  bit  data  element.  Therefore,  the 
single  full  mapping  approaches  1.85  megabits  of  data.  Over 
the  course  of  an  individual  experiment,  hundreds  of  state  maps 
are  transmitted  to  the  agent  controllers  along  with  thousands  of 
state  map  updates  from  the  agent  controllers.  This  means  tens 
of  thousands  of  packets  navigating  the  network  among  the  state 
and  controller  computers.  Notwithstanding  UDP's  lack  of  an 
error  correction  capability,  the  data-communication  solution 
herein  resulted  in  zero  packet  loss  realized  across  more  than 
300  individual  experiments.  Certainly,  this  was  enabled  by 
isolating  the  network  on  a  single  8-port  network  switch; 
however,  also  enabled  by  careful  consideration  and  definition 
of  ports  and  buffers. 

The  resulting  experiment  infrastructure  enabled  a 
comprehensive  series  of  individual  vignettes  and  excursions 
that  explored  varying  MAS  parameters  such  as  the  number  of 
imaging  payloads,  number  of  agents  comprising  the  MAS,  and 
relative  agent  and  cell  charge  values.  These  were  evaluated 
against  mission  performance  criteria  such  as  time  to  explore  a 
selected  percentage  of  the  identified  space  or  the  number  of 
positive  detections  collected  by  proximity  sensing  or  imaging. 
With  these  parameter  and  metric  sets,  methodical  experiments 
can  be  conducted  sequentially  to  result  in  datasets  that  can  be 
reduced  and  interpreted  for  qualitative  and  quantitative 
characterization  of  the  relative  and  absolute  performance  of  the 
various  MAS  configurations. 


Figure  10:  Time  to  Complete  the  Mission  versus  Number  of 
Robots. 

The  most  interesting  conclusion  drawn  was  that  the  relative 
performance  of  a  MAS  as  a  function  of  the  number  of  agents 
comprising  the  MAS  faded  non-linearly  (Figure  10).  In  these 
series  of  experiments,  mission  success  was  defined  as 
exploring  75%  of  the  arena's  cells.  Further,  each  agent  was 
capable  to  use  its  on-board  camera  and  associated  image 
processing  capability  to  detect  and  localize  targets  in  the  arena. 
Each  agent  could  also  use  its  proximity  sensing  for  local 
collision  avoidance  and  for  obstacle  detection  and  localization. 
Excursions  were  run  whereby  the  number  of  agents  comprising 
the  MAS  was  varied  from  one  to  five  with  ten  experiment  runs 
per  excursion.  Mean  values  were  taken  across  the  ten  runs  to 
conclude  the  excursion  results.  In  an  attempt  to  curve  fit  the 
data  (annotated  by  the  diamonds),  both  polynomial  and  power 
types  were  evaluated.  A  third-order  polynomial  curve  (dotted 
line)  provided  a  better  fit  (R2=l)  within  the  bounds  of  one  to 
five  agents;  however,  did  not  extrapolate  as  values  above  six 
agents  drove  the  time  to  completion  to  negative  numbers.  A 
power  curve  fit  (dashed  line)  also  resulted  in  an  excellent  fit 


(R2=0.9898)  and  additionally  provided  better  extrapolation 
beyond  five  robots  with  6-  through  8-agent  MAS  resulting  in 
158,  137,  and  122  seconds,  respectively.  Additionally,  this 
curve  fit  has  an  asymptote  at  0  seconds  as  one  would  expect. 
Logically,  this  is  the  case  whereby  the  entirety  of  the  arena  is 
covered  by  the  concatenation  of  agents'  sensors  such  that  that 
entirety  is  exhaustively  sensed  in  the  very  first  control  cycle’s 
data  collection. 


6.  CONCLUSION 

A  solution  is  described  that  utilizes  a  novel  application  of  a 
potential-force  function  that  includes  tuning  coefficients  to 
control  mobile  robots  orchestrated  as  a  distributed  multi-agent 
system.  Further,  MAS  and  individual  agent  parameters  were 
methodically  explored  to  evaluate  and  characterize  resulting 
overarching  performance  of  the  MAS.  Experimentation  began 
via  a  software  simulation.  Then,  with  the  creation  of  novel 
localization  and  data-transfer  capabilities,  the  experimentation 
migrated  to  a  hardware  implementation.  Results  provide 
valuable  insight  to  the  behavior  of  the  various  MAS 
configurations  such  that  a  priori  mission  planning  can  be 
optimized  as  a  function  of  prioritized  and  weighted  criteria  and 
planning  factors.  Notably,  there  is  a  distinct  tradeoff  between 
MAS  capability  and  consumption  such  that  optimal 
configuration  can  indeed  be  selected  a  priori  to  conduct  of 
missions. 
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Self-sorting  in  a  swarm  of  heterogeneous  agents 


Manish  Kumar,  Devendra  P.  Garg,  and  Vijay  Kumar 


Abstract —  Sorting  of  heterogeneous  units  is  a  self-organized 
behavior  which  is  seen  in  many  biological  systems.  One  of 
the  best  examples  of  such  systems  is  a  system  of  biological 
cells  of  heterogeneous  types  that  has  the  ability  to  self-organize 
into  specific  formations,  form  different  types  of  organs  and, 
ultimately,  develop  into  a  living  organism.  Earlier  research 
in  this  area  has  indicated  that  such  self-sorting  behaviors 
in  biological  cells  and  tissues  are  made  possible  because  of 
difference  in  the  adhesivity  between  different  types  of  cells 
or  tissues.  Inspired  by  this  differential  adhesivity  model,  this 
paper  presents  a  decentralized  approach  based  on  differential 
artificial  potential  to  achieve  the  self  sorting  behavior  in  a 
swarm  of  heterogeneous  robotic  agents.  The  method  is  based 
on  the  proposition  that  agents  of  different  types  experience 
different  magnitude  of  potential  while  they  are  interacting 
with  agents  of  different  types.  An  analysis  of  the  system 
with  the  proposed  approach  in  Lyapunov  sense  is  carried 
out  for  stability.  Extensive  simulation  studies  and  numerical 
analysis  suggest  that  the  proposed  method  would  always  lead 
a  population  of  heterogeneous  agents  closer  to  the  sorted  or 
segregated  configuration. 

I.  Introduction 

Formation  control  of  multiple  autonomous  vehicles  has 
received  attention  of  several  researchers  working  in  the  area 
of  mobile  robotics  because  of  its  potential  applications  in 
a  number  of  fields  including  cooperated  search  and  rescue 
operation,  surveillance,  reconnaissance,  and  boundary  pro¬ 
tection.  Advancement  in  communication  and  sensing  tech¬ 
nologies,  and  in  computing  resources  have  made  it  possible 
to  coordinate  the  movement  of  several  autonomous  vehicles 
working  cooperatively  to  achieve  certain  mission.  One  of  the 
very  first  applications  of  formation  control  of  multiple  agents 
was  behavioral  simulation  of  flocks  of  birds,  herd  of  animals 
and  schools  of  fish  for  computer  graphics  by  Reynolds  [17]. 
He  stated  three  simple  behaviors  that  lead  to  flocking  in  birds 
and  fish:  collision  avoidance,  velocity  matching,  and  flock 
centering  (in  decreasing  order  of  precedence).  The  biggest 
merit  of  Reynolds’  approach  was  that  these  behaviors  were 
based  on  observations  of  local  environment  and  interactions 
on  a  local  scale  that  could  be  fully  implemented  in  individual 
agents.  These  local  interactions  among  agents  resulted  in 
global  flocking,  schooling,  and  herding  behaviors  which  were 
totally  scalable. 
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Drawing  inspiration  from  Reynolds’  approach,  many  re¬ 
searchers  have  focused  on  designing  decentralized  con¬ 
troller  for  achieving  flocking  behavior.  The  examples  in¬ 
clude  behavior-based  methods  [2],  leader-follower  technique 
[5],  [6],  method  based  on  formation  constraint  and  virtual 
leaders/beacons  [7],  Lyapunov  function  based  methods  [13], 
[14],  [15],  [16],  The  concept  of  artificial  potential  has  been 
used  in  robotics  by  many  researchers.  For  example,  artificial 
potential  has  been  used  for  path  planning  [24],  manipulator 
control  [23],  robot  navigation  [12],  and  obstacle  avoidance 
[11],  and  multi-robot  formation  control  [13],  [15]. 

Obtaining  a  desired  shape  and  pattern  of  the  formation 
can  be  critical  for  a  mission  relying  on  coordinated  action 
by  multiple  mobile  agents.  For  example,  if  a  large  number 
of  robots  need  to  be  deployed  to  perform  complicated  tasks 
such  as  surveillance  of  large  area,  perimeter  protection  of 
vital  installation,  or  surrounding  site  of  a  chemical  or  haz¬ 
ardous  waste  spill,  the  robots  must  be  able  to  autonomously 
organize  themselves  in  certain  formation,  pattern,  or  shape. 
In  many  situations,  it  may  not  be  possible  to  integrate  all  the 
capabilities,  sensing  or  actuation,  required  for  different  kinds 
of  tasks  in  an  individual  robot.  Accordingly,  the  robots  may 
have  heterogeneous  abilities  for  sensing  and  actuation  that 
will  enable  them  to  perform  specific  tasks.  Heterogeneous 
robots  must  be  able  to  self-organize  themselves  in  a  mission 
specific  manner  to  carry  out  tasks  assigned  to  them.  The 
main  contribution  of  this  paper  is  synthesis  and  analysis  of 
a  controller  that  allows  the  robots  to  segregate  or  sort  so 
that  they  form  separate  groups  comprising  of  homogeneous 
robots. 

II.  Segregation  and  Sorting  Behaviors  in 
Biology 

Sorting  is  a  phenomenon  which  is  seen  in  several  biolog¬ 
ical  systems.  Examples  include  brood  sorting  by  ants  [4], 
segregation  in  amphibian  larvae  based  on  kinship  [10],  and 
aggregation  /segregation  behaviors  in  cockroaches  [1]  based 
on  odors  of  strains.  Sorting  of  cells  based  on  their  types  and 
functionalities  is  one  of  the  best  examples  of  sorting  in  bio¬ 
logical  systems.  Cell  sorting  is  one  of  the  basic  phenomenon 
which  leads  to  formation  of  patterns  and  organs  in  living 
organisms.  Study  of  formation  of  patterns  in  living  organisms 
is  called  morphogenesis.  The  mechanisms  by  which  these 
patterns  form  can  provide  valuable  insights  for  distributed 
problem  solving  strategies.  Most  of  the  strategies  or  models 
in  literature  that  can  explain  formation  of  patterns  rely  on  dif¬ 
ferential  attraction/inhibition.  For  example,  Swindale’s  model 
[20]  accounts  for  formation  of  ocular  dominance  stripes  in 
visual  cortex  based  on  local  activation  and  lateral  inhibition 
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(LALI)  mechanism  [8]  for  like  type  of  synapses,  and  local 
inhibation  and  lateral  activation  (LILA)  [3],  [20],  the  reverse, 
for  the  unlike  type  of  synapses.  Reaction-diffusion  [22]  is  a 
model  to  mathematically  represent  the  transport  phenomenon 
in  biological  and  natural  systems.  This  model  tries  to  explain 
the  interaction  of  particles  with  the  environment  and  their 
motion  in  space.  In  early  1990s,  Graner  and  Glazier  [9] 
proposed  a  lattice  based  modified  version  of  large-Q  Potts 
model  with  differential  adhesivity  to  explain  and  simulate 
the  sorting  of  a  mixture  of  two  types  of  biological  cells.  In 
fact,  it  has  been  long  known  [19]  that  it  is  the  difference 
in  intercellular  adhesivity  that  leads  to  sorting  in  cells. 
The  final  state  of  cell  configuration  is  achieved  when  the 
overall  surface  energy  is  globally  minimized.  Based  on  this 
principle,  Steinberg  [19]  postulated  that  cells  are  sorted  i.e., 
two  types  of  cellular  units  A  and  B  are  segregated  when: 

wAB  <  +  (,) 

where  W aa  and  Wbb  represent  the  work  of  cohesion 
between  particles  or  cells  of  same  types  (i.e.,  between  types 
A  &  A,  and  B  &  B  respectively),  and  Wab  represents  the 
work  of  adhesion  between  cells  of  types  A  and  B.  The 
method  for  self-sorting  in  artificial  mobile  agents  presented 
in  this  paper  is  motivated  by  this  differential  adhesivity 
phenomenon  observed  in  biological  systems  that  leads  to 
sorting. 

III.  Problem  Formulation 

The  group  of  mobile  agents  consists  of  N  fully  actuated 
agents,  each  of  whose  dynamics  is  given  by  the  double 
integrator: 


configuration  of  agents  where  the  average  distance  between 
the  agents  of  like  types  (type  A  or  type  B)  is  less  than  the 
average  distance  of  agents  between  the  unlike  types  (between 
agents  of  type  A  and  type  B).  Alternatively, 


AA  <  AB  BB  <  AB 

avg  ^  avg">  avg  ^  avg 


(4) 


where  r^g  is  the  average  distance  between  agents  of  types 
X  and  Y. 


IV.  Control  Law  Formulation 

This  section  presents  the  control  law  which  causes  a 
population  of  heterogeneous  agents  to  asymptotically  flock 
as  well  as  segregate.  For  a  system  of  N  mobile  agents  with 
Na  agents  of  type  A  and  Nb  agents  of  type  B,  following 
feedback  control  law  is  considered: 

ui  =  -  hi  -  9*11)  Pi)  (5) 

feiVi  j£Ni 

where  u,  is  the  control  input  to  the  agent  i,  N,  is  the  set 
of  agents  in  the  neighborhood  of  agent  i,  Vij ( 1 1 q3  —  Qi||)  is 
the  artificial  potential  of  interaction  between  agents  i  and 
j,  \\q3  —  q,\\  is  the  norm  of  vector  ( q3  —  qi)  representing 
the  euclidean  distance  between  agent  i  and  j ,  and  V9i 
is  the  gradient  with  respect  to  coordinates  of  agent  i  i.e., 
qi.  First  term  in  equation  (5)  represents  the  gradient  of 
potential  function,  and  the  second  term  represents  damping 
and  causes  the  agents  to  match  their  velocities  with  each 
other.  The  artificial  potential  is  a  non-negative  function  of 
relative  distances  between  a  pair  of  neighbors  given  by 
Vl}  (qi ,  q.j )  :  R2m  — ►  i?> 0.  Artificial  potential  function, 
Vij  due  to  interaction  between  two  agents  i  and  j  can  be 
expressed  [13]  as: 


qi  =  Pi 

Pi  =  Ui(t)  *  =  1,2,  ...,7V  (2) 

where  qt  and  p,  are  m-dimensional  position  and  velocity 
vectors  respectively  of  agent  i.  The  group  of  mobile  agents, 
considered  in  this  paper,  consists  of  two  different  types  of 
agents:  type  A  and  type  B.  The  number  of  agents  of  type  A 
is  Na  and  that  of  type  B  is  Nb  such  that  N  =  Na  +  Nb  ■ 
The  objective  of  this  paper  is  to  synthesize  a  controller  that 
can  asymptotically  flock  and  separate  the  robots  of  type  A 
and  type  B  into  two  different  groups  (referred  to  as  sorting 
or  segregation).  Agents  are  said  to  flock  (asymptotically) 
when  all  agents  achieve  the  same  velocity  vector,  distances 
between  the  agents  are  stabilized,  and  no  collisions  occur. 
Let  us  try  to  precisely  define  the  term  Segregation.  A  group 
of  agents  of  types  A  and  B  are  said  to  be  segregated  if  there 
exists  a  hyperplane  that  separates  the  two  different  types  of 
agents.  Alternatively: 

3e  :  ((qj  —  qi),e)  >0  V*  £  A  &  j  £  B  (3) 

where  (a,  b)  means  inner  product  between  vectors  a  and 
b.  This  implies  that  every  agent  of  type  A  is  on  one  side 
of  the  hyperplane,  and  that  of  type  B  is  on  the  other  side 
of  the  hyperplane.  For  the  sake  of  simplicity,  we  relax 
this  definition  and  alternatively  define  Segregation  to  be  a 


a  (in(qij)  +  if  0  <  qjj  <  di 

>  q:\J  (6) 

a  ^n(di)  +  if  qij  >  d\ 

where,  a  is  a  scalar  control  gain,  and  qi3  =  \\q3  —qi\\-  The 
parameters  do  and  <l\  respectively  represent  the  inter-agent 
distance  below  which  (i.e.  when  ql3  <  do)  the  interaction 
force  is  repulsive  (negative)  and  above  which  (i.e.  when 
qij  >  d\ )  the  interaction  force  is  zero.  Figure  1  shows  the 
potential  function  plotted  against  the  inter-agent  distance. 
As  indicated  in  the  figure,  the  potential  becomes  minimum 
when  the  inter-agent  distance  is  do .  The  interaction  between 
agents  happen  with  the  help  of  sensing  or  communication 
devices.  The  parameter  d±,  then,  can  be  regarded  as  the 
sensing  or  communication  range.  Without  loss  of  generality, 
in  this  paper  the  parameter  d\  is  considered  infinity  so  that 
each  agent  can  interact  with  the  rest  of  the  agents.  Results 
presented  in  this  paper  remain  unaffected  for  finite  d\  if 
connectedness  of  the  underlying  graph  of  the  system  is 
assumed.  Equation  (6),  under  such  a  condition,  can  be  simply 
written  as: 

Vij  =  a  (ln(qij)  +  ^ j  <7) 

The  basis  for  controller  synthesis  in  this  paper  is  the 
parameter  do-  Since  there  are  two  types  of  mobile  agents 
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Interaction  Potential  Interaction  Force 


Fig.  1.  Interaction  Potential  versus  Inter- Agent  Distance 

involved  in  the  system,  there  are  three  different  kinds  of 
artificial  potentials  involved:  a)  Potentials  arising  due  to 
interaction  between  types  A  and  A,  b)  Potentials  arising 
due  to  interaction  between  types  B  and  B,  and  c)  Potentials 
arising  due  to  interaction  between  types  A  and  B.  In  this 
paper,  potentials  arising  due  to  interaction  between  the  same 
types  are  considered  to  be  same,  i.e.,  =  VBB .  The 

term  V. \AA  is  given  by: 

/  dAA\ 

ViAA  =  a  yln(qij)  +  (8) 

and  the  term  V^B  is  given  by: 

VAB  =  a  ( Zn(*j)  +  -2—  )  (9) 

V  Qij  J 

The  control  law  for  segregation  can  be  achieved  when: 

dAA  =  dBB  <  dAB  (10) 

Figure  2  shows  the  plot  of  force  of  interaction  due  to  sim¬ 
ilar  types  and  due  to  dissimilar  types  of  robots  versus  inter¬ 
agent  distance  when  the  condition  for  segregation  controller 
(10)  is  met.  In  this  case,  it  can  be  seen  that  the  interaction 
force  between  agents  of  same  types  is  greater  than  that  of 
force  between  agents  of  different  types  at  any  given  distance. 
Hence,  this  method  of  segregation,  based  on  differential 
potential,  is  analogous  to  Steinberg’s  [19]  explanation  of  cell 
sorting  based  on  differential  adhesiveness  (see  Equation  (1)). 

V.  Controller  Analysis 

In  this  section,  we  carry  out  an  analysis  of  convergence 
and  stability  properties  of  the  system  of  multiple  agents 
obeying  dynamics  given  by  Equation  (2)  under  control  law 
given  by  Equation  (5).  In  order  to  carry  out  stability  analysis 
of  the  collective  motion  of  agents,  the  following  positive 
definite  function  can  be  chosen  as  the  Lyapunov  function: 

</>(q,  P)  =  V (q)  +  *  pTp  (11) 

where  q  £  RmN  is  stacked  position  vector  of  all  agents, 
p  £  RmN  is  stacked  velocity  vector  of  all  agents,  and 


Fig.  2.  Interaction  Force  between  Agents 

V(q)  :  RmN  — ►  R>0  is  the  total  potential  energy  of  the 
system.The  total  potential  of  the  system  consists  of  three 
parts:  1)  Potential  due  to  interaction  of  agents  of  type  A,  2) 
Potential  due  interactions  amongst  agents  of  types  A  and  B, 
and  3)  Potential  due  to  interaction  of  agents  of  type  B.  This 
can  be  written  as: 

V(q)  =  v  AA  (q)  +  v  BB  (q)  +  v  AB  (q) 

=  *E  E  ^(ii^-ftii)+EE^(ii^-®i!) 

i€.Aj£A,j^i  i(zAj£B 

+*E  E  va(H-<n\\)  (i2) 

i€B  j€.B,jj^i 

The  collective  dynamics  of  the  system  can  be  given  by: 

q  =  p  (13) 

p  =  -  W(q)  -  L(q)p  (14) 

where  L{ q)  £  RmNxrnN  js  m-dimensional  graph  Laplacian 
(see  reference  [15]).  Among  other  important  properties  of 
graph  Laplacian  matrix  L( q),  it  is  a  positive  semi-definite 
matrix. 

Lemma  5.1:  Consider  a  system  of  N  mobile  agents.  Each 
of  the  agents  follows  dynamics  given  by  Equation  (2),  and 
with  feedback  control  law  given  by  Equation  (5).  For  any 
initial  condition  belonging  to  the  level  set  of  </>(q,  p)  given 
by  Lie  =  {(q,  p)  :  </>(q,  p)  <  C)}  with  C  >  0,  and  when  the 
underlying  graph  of  the  system  is  connected  and  cohesive, 
then  the  system  asymptotically  converges  to  an  invariant  set 
Lli  C  Lie  such  that  the  points  in  Llj  have  a  velocity  that  is 
bounded  and  velocity  of  all  agents  match. 

Differentiating  </>( q,  p)  with  respect  to  time  and  using 
Equation  (14)  one  gets: 

0(q,p)  =  pTVP(q)  +  pTp 

=  pTW(q)  +  pT(-VH(q)  -  L(q)p) 

=  -pTC(q)p  <  0  (15) 

since  L(q)  is  a  positive  semi-definite  matrix.  From 
Lasalle’s  Invariance  Principle,  all  solutions  of  the  system 
starting  in  Lie  will  converge  to  the  largest  invariant  set 
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tti  =  I  (q,  p)  £  flc  :  <J(q,  p)  =  0)  j,  and  this  happens  when 
the  velocity  of  all  agents  match.  For  a  detailed  proof  of  this 
lemma,  please  see  references  [13],  [15].  Also,  equilibrium 
condition  is  achieved  when  p  =  0.  Since  there  are  no 
external  forces  acting  on  the  agents,  the  velocities  of  all 
agents  become  zero  (the  center  of  mass  of  the  system  does 
not  move),  i.e.  p  =  0.  Hence,  equilibrium  is  achieved  when 
the  total  potential  of  the  system  is  at  extremum.  This  leads 
to  the  following  proposition. 

Proposition  5.2:  When  control  actions  lead  to  the  global 
minimization  of  potential,  then  the  system  is  segregated  if 
the  condition  for  segregation  controller  (10)  is  applied. 

Since  each  term  of  V(q)  is  a  non-negative  term,  the 
global  minimum  is  reached  when  each  individual  term  of 
the  expression  (12)  is  minimum  (assuming  each  term  is 
independent  of  the  other).  This  happens  when: 


Proposition  5.3:  If  we  consider  one  dimensional  case,  i.e., 
q,p  £  R,  then  the  system  of  heterogeneous  swarming  agents 
following  dynamics  (2)  and  control  law  (5)  flock  together 
such  that  the  average  distance  between  the  agents  of  different 

types  is  greater  than  or  equal  to  the  parameter  dA 11 ,  i.e., 

ab  >  jAB 

I  avg  £,  a0  ' 

This  proposition  can  be  proved  using  equation  (21)  in  the 
following  manner.  Let  us  write: 

/if  =FljB(qj-qi)  (22) 

Let  us  assume  that  out  of  uaub  possible  terms  of  f{tjB , 
m  are  the  terms  for  which  q^  —  rij  >  dAB  and  n  (m  +  n  = 
n-An-B)  are  the  terms  for  which  </,;7  =  rij  <  dAB .  Hence, 
equation  (21)  can  be  written  as: 


*2: 

II 

^cT 

1 

-  dAA 

Vi  £  A,  j  £  A 

1 

c? 

II 

sT 

-  dBB 

Vi  £  B,  j  £  B 

1 

c? 

II 

-  dAB 

—  ao 

Vi  £  A,  j  £  B 

(16) 

Hence  if  condition  (10)  for  controller  is  applied,  we  will 
have  the  segregation  given  by  condition  (4). 

However,  the  system  of  agents  with  a  global  minimum 
configuration  where  all  individual  potentials  are  minimum 
is  not  achievable.  Problems  if  this  nature,  referred  to  as 
graph  embeddings  [18],  have  been  extensively  studied  in 
topological  graph  theory.  More  likely,  the  system  will  reach 
a  local  minimum  condition  which  is  given  by: 

V  V  (q)  =  0  (17) 

The  partial  derivative  given  by  equation  (17)  for  an  agent 
i  £  A  is  given  by  the  equation: 


E^B+E^B  =  0  <23) 

k- 1  fc= 1 

Let  us  assume  that  dAB  +  xc  is  the  mean  distance  for 
the  terms  for  which  rij  >  dA B,  and  dA B  —  xl:j  is  the  mean 
distance  for  the  terms  for  which  r,.y  <dAB.  Hence,  equation 
(23)  can  be  written  as: 

m  (/fc| d£z+xc)  +  n  ( fk\dAB_Xc, )  =  0  (24) 

Subsituting  from  equations  (20)  and  (22): 

m  n 

/  i/l  D  j  /  jAB  V?  ^C'  ^  (^) 

(dfiB  +  xc)2  (dfiB-xct)2 

and  since  (dAB  +xc)  >  ( d AB  —xc>),  we  have  from  equation 
(25): 

mxc  >  nxci  (26) 


v*nq)=  E 


-E« 

j£B 


j£A,j^i 

1 

IWdi-QjW 


1 


dAA 

a0 


I  Qi  -  Qj  I 

dAB 

ao 

H  -  dj\\ 


hi  -  Qj\ 

(gj  ~  Qi) 

hi  -  gjl! 


fa  -  g i) 

II  gi  -  gjf 


The  average  distance  between  agents  of  type  A  and  type 
B  is  given  by: 


=  0 


(18) 


Equation  (18)  is  also  a  force  balance  equation  for  an  agent 
i  £  A  and  can  be  re-written  as: 


„ab 


1 


m  +  n 


[m(dQ- B  +  xc)  +  n(do  B  -  xc'j) 


=  d. 


AB 


1 


m  +  n 


(mxc  —  nxc‘ )  >  g(q  (27) 


E  ^fe-go  +  E^fe-  ®)  =  o 

j&A.j^i  jGB 

where 


pAA  _ 

l 

dAA 

Jig* -gill2 

l|g*-gill3. 

j?AB  _ 

l 

dAB 

C4 

1 

_ 1 

CO 

1 

c? 

(19) 


(20) 


If  we  sum  up  equation  (19)  for  all  i  £  A,  and  noting  that 
FAA  =  Fj\A,  then  we  will  obtain  the  following  equation: 


E  E  -  di)  =  °  (2D 

i&A  jeB 

Equation  (21)  leads  to  the  following  proposition: 


dAB  is  a  design  parameter  that  can  be  chosen  to  be 
arbitrarily  large  value,  and  since  rABg  is  always  greater  than 
dAB,  rAB  can  be  made  arbitrarily  high. 

Let  us  now  examine  the  two-dimensional  case  when  q,p  £ 
R2.  Since  F^B  is  a  scalar  quantity,  in  two  dimensional  case, 
equation  (21)  can  be  equivalently  written  into  the  following 
two  scalar  equations: 

EE^i  =  ° 

i&A  j£B 

iGAjGB 

where  xtj  =  (xj  -  Xi)  and  yl3  =  (y7  -  yi),  xt  and  yt  are 

coordinates  along  X  and  Y  axis  for  agent  i.  Hence  (/,7  = 


(28) 

(29) 
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y  (x?j  +  y'f,j ) .  Let  us  write  equations  (28)  and  (29)  as  the 
following: 


^  ^  H-  ^  ^  FkXk  —  0 

k:Fk>  0  k:Fk<  0 

(30) 

F  FkVk  +  ^2  FkUk  =  0 

k:Fk>  0  k:Fk<.  0 

(31) 

If  we  assume: 

y;  Fkxk  =  ci 

k:Fk>  0 

(32) 

then 

y  Fkxk  =  -ci 

k:Fk  <0 

(33) 

and  similarly. 

y  Fkyk = c2 

k :  Fk  >  0 

(34) 

y  Fkyk  =  c2 
k:Fk<0 

(35) 

Then  from  Cauchy-Schwarz  inequality  relation, 
(32)  -  (35),  and  noting  that  qk  =  yjxk  +  yk): 

equations 

||F||Ffe>0||Q1||>v/cf  +  ci  =  c 

(36) 

||F||Ffc<o||Q2||>^cf  +  cl  =  c 

(37) 

where  F  =  {Fi,F2...},  Q  =  {gi, <?2...},  and  c  >  0. 

Again  from  Cauchy-Shwarz  inequality  relations  and  equa¬ 
tions  (36)  and  (37),  it  is  evident  that: 

0  <  y  Fkqk  <  c 

Fk>  0 

(38) 

-c  <  y  Fkqk  <  0 

Fk<0 

(39) 

VI.  Simulation  Results  and  Discussions 
Extensive  simulations  were  carried  out  to  verify  the  re¬ 
sults  obtained  in  the  previous  sections.  In  the  simulations, 
following  parameters  were  assumed  : 

d£A  =  d£B  =  3,  and  dAB  =  6 

Figure  3  shows  the  configuration  of  a  population  of  20 
agents  (10  each  of  type  A  and  B)  in  a  2D  space  at  different 
times  during  the  simulation.  The  agents  started  off  at  a 
random  configuration,  and  control  law  given  by  equation  5 
based  on  differential  potential  was  applied  to  the  agents.  The 
final  configuration  at  time  T=750  sec  shows  that  the  agents 
of  types  A  and  B  form  two  separate  groups. 


X  Axis  (m)  X  Axis  (m) 


Fig.  3.  Configurations  of  Agents  at  Times  T=0  (top  left),  T=250  sec  (top 
right),  T=5()0  (bottom  left),  and  T=750  sec  (bottom  right) 

Figure  (4)  shows  the  plot  of  average  distances  between 
agents  of  types  A  and  A  (rAA),  B  and  B  (rBB;)),  and  A 
and  B  (rABg)  versus  time  for  the  above  simulation.  At  the 
final  configuration,  the  average  distances  rAAg,  rBJg ,  and  rABg 
were  found  out  to  be  3.21,  3.22,  and  6.97  respectively,  which 
clearly  shows  that  the  population  was  segregated  based  on 
the  condition  given  by  (4). 


From  inequalities  (38)  and  (39),  we  can  write: 

-c<  £  Fkqk  +  y  Fkqk  <  c  (40) 

Fk>0  Fk<0 


Using  similar  analysis  used  to  prove  Lemma  (5.3),  we  can 
show  that: 


mxc  —  nxc>  >  — c 


(41) 


where  m  is  the  number  of  terms  for  which  fk  >  0,  and  n 
is  the  number  of  terms  for  which  fk  <  0.  This  leads  to  the 
following: 


rAB  ^  „ 
avg  —  ^0 


>dAB  - 


(42) 


Average  Distances 


The  value  c  in  the  above  inequality  is  bounded  because 
the  quantities  C\  and  C2  in  equations  (32)  and  (34)  are 
bounded.  Since  c  is  bounded,  we  can  always  choose  dAB 
to  be  arbitrarily  large  making  rABg  to  be  arbitrarily  large. 


Fig.  4.  Average  Distances  Between  Agents  of  Types  A  and  A  (rA^),  B 
and  B  (rf„®),  and  A  and  B  {rA^g) 

The  results  given  above  were  for  just  one  simulation 
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run.  In  order  to  verify  that  the  method  presented  in  this 
paper  leads  to  segregation  in  general  in  a  population  of 
heterogeneous  agents,  an  extensive  simulation  study  was 
carried  out  in  which  more  than  100  atns  were  performed. 
Figure  5  shows  the  average  distance  between  agents  at 
steady-state.  In  each  of  the  simulation  runs,  the  population 
of  agents  consisting  of  types  A  and  B  was  initialized  in  a 
random  configuration  obtained  via  uniform  distribution  of 
agents  in  2D  space,  and  number  of  agents  of  type  A  and  B 
were  each  chosen  randomly  between  values  5  and  15.  Each 
of  the  runs  was  carried  out  for  500  seconds  of  simulation 
time.  The  average  distances  between  agents  shown  in  the 
figure  are  calculated  at  the  steady  (final)  state.  It  can  be 
easily  seen  that  the  average  distance  between  agents  of  type 
A  (j'aiTg)  and  average  distance  agents  of  type  B  (rB^)  is 
less  than  the  average  distance  between  agents  of  type  A  and 
B  (j'avg)  f°r  each  °f  the  simulation  runs.  Also,  it  is  evident 
from  the  figure  that  r^Bg  is  always  greater  than  the  parameter 
cIqB  supporting  our  result  from  proposition  5.3.  Moreover, 
in  each  of  the  simulation  runs,  the  agents  were  completely 
segregated  and  satisfied  the  separating  hyperplane  condition 
of  segregation  given  by  equation  (3). 


Average  Distances 


Fig.  5.  Average  Distance  between  Agents 

The  method  presented  in  this  paper  is  equally  applicable 
to  more  than  two  types  of  agents  and  in  higher  dimensional 
space. 

VII.  Conclusions 

The  paper  presents  a  decentralized  method  to  achieve  self 
organized  bahavior  of  sorting  or  segregation  in  a  population 
of  heterogeneous  agents.  The  method  is  based  on  the  concept 
of  differential  artificial  potential.  The  paper  presents  the 
stability  analysis  of  a  population  of  agents  in  Lyapunov 
framework,  and  lays  down  an  analytical  foundation  for 
synthesis  of  controllers  for  self-sorting  in  artificial  potential 
function  framework.  Specifically,  condition  for  the  synthe¬ 
sis  of  controllers  for  sorting  is  analyzed  in  one  and  two 
dimensional  space.  Extensive  simulation  studies  verify  the 
results  obtained  in  this  paper,  and  shows  the  effectiveness  of 
the  proposed  method  in  achieving  the  sorting  behavior  in  a 
swarm  of  heterogeneous  agents. 
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A. 2. 2  Role  of  Stochasticity  in  Self-Organization  of  Robotic 
Swarms 

The  following  paper  (on  the  next  6  pages)  was  co-authored  by  Manish  Kumar, 
Dejan  Lj.  Milutinovic,  and  Devendra  P.  Garg,  and  appeared  in  the  Proceedings 
of  the  2008  American  Control  Conference  on  pages  123-128,  published  and 
presented  in  June  2008. 
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Role  of  Stochasticity  in  Self-Organization  of  Robotic  Swarms 


Manish  Kumar,  Dejan  Milutinovic  and  Devendra  P.  Garg 


Abstract — This  paper  investigates  the  effectiveness  of  de¬ 
signed  random  behavior  in  self-organization  of  swarm  of  robotic 
agents.  Inspired  by  the  self-organization  observed  in  biological 
cells  and  the  role  played  by  random  forces  in  providing 
robustness  in  cell  self-organization,  we  investigate  the  possibility 
of  designing  a  decentralized  controller  for  a  swarm  of  agents  in 
which  the  stochastic  process  is  included.  This  paper  considers 
flocking  as  a  self-organizing  behavior  example  to  validate 
our  findings.  The  controller  is  designed  in  the  framework  of 
Lyapunov  function,  and  it  is  based  on  the  artificial  potential 
due  to  interactions  among  agents.  Our  analysis  shows  that  the 
flocking  behavior  of  the  swarm  is  improved  and  is  more  robust 
when  the  stochastic  process  is  included  in  the  agent  controller. 

I.  Introduction 

A  swarm  of  robotic  agents  performing  coherent  activities 
while  controlled  in  a  decentralized  manner  is  an  example 
of  a  self-organized  system  [12].  There  are  several  examples 
in  nature  where  individual  units  carry  out  operations  based 
on  local  interaction  and  local  information  without  a  complete 
knowledge  of  other  units’  operation.  Yet  the  overall  emerging 
behavior  of  the  system  appears  to  be  highly  organized, 
coherent,  and  efficient  in  achievement  of  its  objectives.  The 
best  examples  of  such  systems  are  populations  of  biological 
cells  that  possess  the  ability  to  self-organize  into  specific 
formations,  form  different  types  of  organs  and,  ultimately, 
develop  into  a  living  organism.  Most  importantly,  their  ability 
to  self-organize  is  extraordinarily  robust. 

The  cell  behavior  is  guided  by  biochemical  signals  and 
the  structure  of  the  environment.  Intracellular  biochemical 
signaling  networks  are  involved  in  the  detection  of  the 
environment  and  they  drive  the  cell  behavior,  their  function 
and  motility.  The  role  of  signaling  networks  for  cells  is 
similar  to  the  one  that  the  robot  hardware  plays  for  the 
algorithms  that  guide  the  robot  communication  and  sensory 
systems,  as  well  as  the  robot  behavior  [10].  Because  of  this, 
the  study  of  self-organized  cellular  systems  is  more  relevant 
for  the  design  of  robotic  swarms  as  compared  to  the  study 
of  biological  multi-agent  systems,  such  as  flocks  of  birds  or 
schools  of  fish,  in  which  the  presence  of  natural  intelligence 
among  biological  entities  cannot  be  ruled  out.  In  this  respect, 
the  design  of  agent  swarms  inspired  by  a  self-organized 
cellular  system  is  a  very  promising  direction  of  research. 
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This  paper  investigates  the  role  random  forces  can  play 
in  attaining  robust  behaviors  in  swarm  systems.  The  paper 
draws  inspiration  from  the  role  of  randomness  in  several 
search  algorithms  and  in  several  biological  systems.  Ran¬ 
domness  has  been  widely  used  as  a  component  in  sev¬ 
eral  stochastic  search  algorithms.  For  example,  evolutionary 
search  algorithms  [5]  have  heavy  stochastic  components  (e.g. 
choice  of  chromosomes  for  reproduction,  crossover,  and 
mutation  are  driven  by  probabilities  based  on  their  fitness 
function).  Reinforcement  learning  techniques  [18]  have  two 
components:  exploitation  and  exploration.  Exploration  is  a 
random  walk  in  the  search  space  that  makes  the  algorithm 
investigate  new  regions.  Apart  from  randomness  being  tradi¬ 
tionally  used  in  search  and  optimization  algorithms,  random 
components  have  shown  to  play  critical  role  in  modeling 
many  biological  systems.  Bateson  [3]  calls  the  mind  a 
stochastic  system  and  cognitive  learning  process  a  stochastic 
process.  Contemporary  cognitive  scientists  consider  mental 
processes  as  stochastic  processes  such  as  evolutionary  algo¬ 
rithms  where  hypotheses  or  ideas  are  proposed,  tested,  and 
either  accepted  or  rejected  by  a  population.  Random  or  trial- 
and-error  learning  techniques  provide  ways  to  create  new 
varieties  of  solutions  for  problems.  Random  behavior  is  ubiq¬ 
uitous  in  biological  systems.  Chaotic  behavior  of  a  hooked 
fish,  random  behavior  among  preys  for  predator  avoidance, 
and  zig-zagging  of  a  chased  rabbit  through  a  meadow  are  all 
examples  of  existence  and  heavy  use  of  random  behaviors 
among  animals.  Lorenz  [9],  in  his  intuitive  chapter  entitled 
’’Oscillation  and  Fluctuation  as  Cognitive  Functions”,  has 
described  the  importance  of  a  random  behavior  in  organisms’ 
motion  for  search,  as  well  as  for  escaping  dangers.  We  use 
this  kind  of  random  behavior  as  a  component  in  the  control 
laws  for  swarm  robotic  systems. 

In  this  paper,  we  consider  flocking  which  is  the  most  in¬ 
vestigated  example  of  swarm  self-organization.  Flocking  is  a 
self-organized  behavior  in  which  agents,  initially  distributed 
over  the  operating  space,  group  together  and  organize  into 
a  specific  formation.  An  example  of  this  kind  of  flocking 
is  the  formation  of  germinal  centers  inside  the  lymph  node 
[16].  Flocking  has  been  extensively  studied  in  multi-agent 
literature.  Reynolds  [15]  has  been  able  to  reproduce,  in  his 
computer  models,  behaviors  representing  flocking  in  birds 
and  schooling  in  fish  using  simple  rules  based  on  local  inter¬ 
actions  among  agents.  Drawing  inspiration  from  Reynolds’ 
approach,  many  researchers  have  focused  on  designing  a 
decentralized  controller  for  achieving  flocking  behavior  [2], 
[4],  [13],  [19],  A  control  system  based  on  the  methods 
presented  in  these  references  should  be  able  to  yield  a  single 
flock  of  agents  based  on  local  information.  However,  if 
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the  local  information  can  only  be  collected  over  a  finite 
range,  it  can  lead  to  the  formation  of  more  than  one  flock. 
This  paper  studies  scenarios  in  which  local  interaction  leads 
to  the  fragmentation  of  groups  and  investigates  how  the 
introduction  of  random  processes  in  an  agent  controller  helps 
to  eliminate  or  alleviate  the  problem  of  fragmentation. 

In  this  paper,  we  consider  flocking  in  the  Lyapunov  func¬ 
tion  framework.  The  Lyapunov  function  for  the  controller 
design  of  a  multi-agent  system  is  based  on  the  artificial 
potential  function  of  agent  interactions  [8],  [14],  Being 
inspired  by  the  role  played  by  the  random  fluctuation  in 
driving  self-organizing  processes  in  cells,  we  propose  to 
include  a  random  process  in  the  decentralized  controller  for 
an  agent  swarm.  Consequently,  we  analyze  the  swarm  self¬ 
organization  using  the  stochastic  Lyapunov  function  [7].  We 
show  that  our  proposed  decentralized  controller  provides  the 
robust  flocking  of  the  swarm  of  agents. 

This  paper  is  organized  as  follows.  We  introduce  a  model 
of  swarming  agents  in  section  II.  In  section  III,  we  describe 
a  deterministic  decentralized  design  of  controllers  providing 
flocking  in  a  population  of  agents.  Dealing  with  the  problem 
of  local  minima,  we  suggest  including  a  stochastic  process 
into  the  controller  in  section  IV.  In  section  V,  we  derive 
emergent  behavior  related  to  the  center  of  the  mass  of  the 
flock.  Section  VI  presents  numerical  simulations  illustrating 
the  performance  of  the  controller,  and  finally  section  VII 
presents  the  conclusions  derived  from  the  present  work. 

II.  Modeling  of  swarm  system  and  Lyapunov 

FUNCTION  DEFINITION 

The  group  of  mobile  agents  consists  of  N  fully  actuated 
agents,  each  of  whose  dynamics  is  given  by  the  double 
integrator: 


qi  =  Pi 

Pi  =  Ui(t ) 


(l) 


x  =  Ax(f)  -f  Bu(f) 
where  vector  x(f)  =  [q(f)  p(i)]2  and 
qi 

q(<)  = 


qi 

Pi 

Ui 

C-+- 

II 

,  u(f)  = 

.  Pn  . 

UN 

(2) 


(3) 


and  matrices  A  and  B  are 

A (x,  t)  =  [Iatxiv  Oatxjv]T,  B  =  [Oatxat  Ijvxw]3  •  (4) 


In  order  to  carry  out  a  stability  analysis  of  the  collective 
motion  of  agents,  a  Lyapunov  function  can  be  chosen  as  the 
total  energy 

<Kq,p)  =  ^(q)  +  ^pTp  (5) 


The  Lyapunov  function  is  composed  of  the  total  artificial 
potential  energy  V(q)  and  the  kinetic  energy,  the  second 
term  of  the  sum  given  in  equation  (5).  We  define  the  potential 
energy  as  a  non-negative  function 

1  N  i  Jt 

^(q)  =  2E^  =  2EE  %(l \Qi  Sill)  (6) 

i= 1  i= 1  j€Ni 


Vijihi  -  Qj\\)  >  o 


(7) 


where  Vi  is  the  total  artificial  potential  associated  with  an 
agent  i.  This  energy  depends  on  the  set  Nt  comprised  of  the 
agents  inside  the  communication  range  of  the  agent  i.  The 
artificial  potential  function  V?J  depends  upon  the  Eucledian 
distance  ||gj  —  q3  \ \  between  the  agent  i  and  j,  and  it  is  given 
by: 


Vij  = 


a(ln(\\qi  -  q3 1|) 
a{ln{d{)  +  I1), 


do  \ 


0  <  hi  ~  qj  ||  <  di 

hi  -Qj  II  >  di 

(8) 

where,  a  is  a  scalar  parameter.  The  parameters  d{)  and  (1\ 
respectively  represent  the  inter-agent  distance  below  which 
the  interaction  force  is  repulsive  (negative)  and  above  which 
the  interaction  force  is  zero.  Figure  1  shows  the  interaction 
potential  plotted  against  the  inter-agent  distance.  It  can  be 
easily  seen  that  the  potential  becomes  minimal  when  the 
inter-agent  distance  is  do-  The  interaction  among  agents 
happens  with  the  help  of  sensing  or  communication  devices. 
The  parameter  d±,  then,  can  be  regarded  as  a  sensing  or  a 
communication  range. 


where  qt,  pi  and  u,  are  m-dimensional  position,  velocity 
and  control  vectors  of  agent  i,  respectively.  This  double 
integrator  representing  particle  dynamics  [14],  [8],  [20]  is  a 
popular  and  realistic  model  to  represent  the  motion  of  agents 
in  a  multi-agent  system.  It  facilitates  the  implementation  of 
decentralized  control  algorithms,  and  provides  a  mechanism 
to  include  limitations  due  to  sensing  and  communication  as 
compared  to  other  models,  such  as  continuum  model  [6], 
[11],  which  represents  the  collective  motion  of  agents  in 
the  form  of  particle  density  functions.  The  model  given  in 
equation  (1)  can  be  written  as: 


Interaction  Potential 


Fig.  1.  Interaction  potential  versus  inter-agent  distance 


III.  Controller  Design  for  Flocking 

Flocking  is  a  form  of  self-organized  behavior  of  agent 
swarms  in  which  agents  meet  or  come  together.  The  col¬ 
lective  dynamics  of  the  system  can  be  analyzed  using  a 
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Lyapunov  function.  We  can  differentiate  0(q,  p)  with  respect 
to  time,  and  using  expression  (5)  and  model  (1),  one  gets: 

0(q,p)  =  pTw(q)  +  pTp 

=  pT  W(q)  +  pTu(f) 

Using  arguments  based  on  the  Lyapunov  function  stability 
analysis,  the  state  configuration  p,  q  will  be  stable  if 

<Mq,  P)  =  pTW(q)  +  pTu(f)  <  0  (9) 

The  control  law  ensuring  this  type  of  stability  is  described  by 
the  following  lemma  of  which  a  detailed  proof  is  provided 
in  references  [14]  and  [8]. 

Lemma  3.1:  Consider  a  system  of  N  mobile  agents.  Each 
of  the  agents  follows  dynamics  given  by  model  (1)  and  with 
the  feedback  control  law  given  by 

wi  =  -v9iyl  +  /^  (io) 

where  V9i  represents  the  gradient  with  respect  to  position  qt 
of  agent  i,  c  >  0  is  a  scalar  gain  and  f?  is  given  by: 

fi  =c'52(Pj-Pi)  (11) 

jeNi 

For  any  initial  condition  belonging  to  the  level  set  of  </>( q,  p) 
given  by  Lie  =  {(q,  p)  :  </>(q,  p)  <  C)}  with  C  >  0,  and 
when  the  underlying  graph  of  the  system  is  connected  and 
cohesive,  the  system  asymptotically  converges  to  an  invariant 
set  f C  Lie  such  that  the  points  in  f 2/  have  a  velocity  that 
is  bounded  and  the  velocities  of  all  agents  match. 

To  illustrate  this  lemma,  it  is  worth  mentioning  that 
the  control  law  (10)  results  from  the  fact  that  it  can  be 
equivalently  described  in  the  vector  form  as 

u(t)  = -W(q)  -  L(q)p  (12) 

where  L( q)  £  I!1" N x m N  is  m-dimensional  graph  Laplacian 
(see  reference  [14]),  which  is  a  positive  semi-definite  matrix. 
Obviously,  this  control  satisfies  condition  (9).  Also,  from 
Lasalle’s  Invariance  Principle,  all  solutions  of  the  system 
starting  in  Lie  will  converge  to  the  largest  invariant  set 
Lli  =  j(q,p)  S  Llc  ■  <£(q,p)  =0)j,  and  this  occurs  when 
the  velocities  of  all  agents  match. 

IV.  Controller  design  based  on  the  stochastic 
Lyapunov  function 

Using  the  deterministic  Lyapunov  function  controller  de¬ 
sign,  agents  always  move  in  such  a  way  that  the  covered 
distances  and  their  directions  do  not  increase  the  Lyapunov 
function.  It  is  ultimately  expected  that  the  agents  reach  a 
stable  formation  (flock)  in  which  the  Lyapunov  function 
attains  its  extremum  value.  If  the  agents  that  we  are  dealing 
with  have  a  limited  communication  range,  it  is  possible  that 
robots  reach  stable  formations  with  more  than  one  cluster. 
In  this  configuration,  agents  from  one  cluster  are  out  of  the 
communication  range  of  any  agent  from  the  other  clusters. 
The  graph  Laplacian  L(q)  is  no  longer  positive  semi-definite 
in  this  case,  and  a  local  minimum  of  the  Lyapunov  function 
is  reached. 


Here,  we  introduce  a  controller  which  includes  the  random 
process  term,  providing  means  to  an  escape  from  local 
minima.  Using  the  notation  of  section  II,  the  model  of  a 
robot  population  can  be  written  in  the  matrix  form  as 

x  =  Ax(t)  +  Bu(t)  +  E£(£)  (13) 


where 


S  = 


0 

0 

0 

diag(ai,  <72,  ■  ■  ■  &n ) 

(14) 


and  the  controller  effect  is  composed  of  the  part  which  is 
assumed  to  be  computed  based  on  the  local  agent  information 
Bu(f)  and  the  stochastic  part  S£(f). 

Using  the  Lyapunov  function  as  before  and  using  the  Ito 
formula,  we  can  find  the  time  derivative  of  the  Lyapunov 
function 


v-^  d <j>  d</>  1  v-^  2  d24>  <9d>  .  , 

=  +u*{t)dil  +  2  ^  Wi  +5>/^<(t) 


'  dq, 
that  results  in 


=  T.iPidVi{qibT'qN)  +Y,Mt)Pi  + 

I  Ei  °f  +  Ei  °iPMt) 


dpi 


(15) 


(16) 


The  latter  expression  means  that  in  this  case  the  Lyapunov 
function  <j>  of  the  robotic  population  is  a  stochastic  process. 

Similar  to  the  deterministic  case,  if  we  would  like  to 
design  a  controller  which  aligns  the  robot  velocities,  i.e., 
provides  flocking,  we  can  define  iq  as 

dVi(q1,q2,...qN) 

Ui  = - X - [L(q)p]i  (17) 

UQi 

Under  this  condition 

4>  =  -pT£(q)p+  +Y^aiPi&(t)  (18) 

i  i 

which  means  that  the  total  Lyapunov  function  value  is  a 
stochastic  process.  The  stochasticity  provides  escape  from 
the  local  minima.  We  assume,  without  losing  generality,  that 
u i  =  a.  The  intensity  of  this  stochastic  process  is  governed 
by  the  parameter  a  which  needs  to  be  determined  by  taking 
appropriate  considerations  of  factors  explained  below. 

There  is  a  stochastic  steady-state  (see  reference  [7],  page 
50,  Theorem  6)  for  the  value  of  cf>  in  which  the  following 
condition  is  satisfied: 

^{pTi(q)p}  =  =  ~y~  (19) 

There  are  two  limits  for  er  that  should  be  avoided.  One  is 
a  <  <tl  that  results  in  the  deterministic  controller.  In  this 
case  the  swarm  does  not  flock  robustly.  The  second  limit 
is  a  >  <th ,  when  cr  is  large  and  leads  to  a  large  expected 
value  of  p7  L(q)p.  This  means  that  the  robot  velocities  are 
poorly  aligned.  For  reasons  of  the  robust  flocking,  a  should 
be  in  the  range  of  rr/  and  an-  Out  of  this  range,  the  flocking 
does  not  happen  robustly  due  to  the  small  (cr  <  gl)  or  large 
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(er  >  ct//)  intensity  of  the  random  process,  respectively.  The 
value  of  a  has  to  be  tuned  taking  into  account  the  constraints 
of  the  actuator  that  drives  the  robots  and  it  is  a  part  of  the 
controller  design.  Introduced  controller  does  not  guarantee 
the  attainment  of  global  minimum  of  the  Lyapunov  function, 
but  it  can  guarantee  that  the  formation  of  the  system  is,  most 
of  the  time,  around  this  minimum. 

V.  Emergent  Behavior  of  a  Stochastic  Control 
for  Flocking 

The  cluster  is  a  group  of  robots  in  which  each  robot  is 
in  the  communication  range  of  at  least  one  robot.  Motion 
of  each  agent  i  obeys  the  following  stochastic  differential 
equation 


Qi  =  Pi  (20) 

Pi  =  Ui(t)  +  (Ji£i(t)  .  (21) 


This  expression  includes  two  parts:  the  first  part  based  on 
the  local  information  and  the  second  part  based  on  the 
stochastic  process  of  intensity  07.  We  assume  again  that  £  is 
the  Gaussian  white  noise  of  the  unit  intensity.  For  reasons 
of  simplicity,  but  without  losing  generality,  we  will  consider 
that  cr,(  =  er. 

Since  each  robot  has  a  unique  label  i,  the  cluster  Cj  can 
be  defined  as  the  set  of  labels  i  of  the  robots  being  in  the 
cluster  j.  Assuming  that  the  cluster  exists  within  the  time 
interval  r  =  [ti  £2],  the  motion  of  the  Cj  cluster’s  center  of 
the  mass  is  described  by 


pfM 


=  p 


CM 


(22) 

(23) 


where  the  number  of  the  robots  in  the  cluster  is  denoted 
by  \Cj\.  The  terms  dependent  on  it,  are  cancelled  due  to 
the  symmetry  in  the  interaction  between  any  two  robots  in 
the  cluster.  The  sum  of  white  Gaussian  noises  is  also  the 
white  Gaussian  noise  and  the  last  term  in  (23)  includes  this 
properly  scaled  unit  intensity  Gaussian  noise. 

From  (23),  we  can  conclude  that  the  center  of  the  cluster 
mass  moves  randomly  through  the  operating  space.  The 
smaller  the  cluster,  the  faster  it  ’’explores”  the  operating 
space  before  it  meets  other  robots  to  form  a  larger  cluster. 
We  quote  the  word  ’’explore”,  because  it  is  a  behavior  that 
emerges  from  the  inclusion  of  the  random  process  in  the 
robot  controller. 

Obviously,  while  exploring,  the  cluster  may  decompose 
into  two  smaller  clusters.  The  probability  of  cluster  decom¬ 
position  can  be  made  very  small  by  appropriately  choosing 
the  intensity  a  to  be  small  in  comparison  to  the  intensity 
of  it,.  Regardless  of  this,  the  cluster  decomposition  does  not 
influence  our  further  analysis,  because  we  analyze  the  steady- 
state  property  of  the  robot  formation  independently  of  how 
it  is  reached.  In  the  limit  when  Cj  is  large,  the  center  of 
the  cluster  mass  will  move  with  q*?M  «  0. 


VI.  Simulation  Results 

Before  we  present  the  simulation  results,  let  us  explain 
social  and  hierarchical  social  entropy  metrics  we  use  to 
quantify  the  degree  of  the  self-organization  of  agent  pop¬ 
ulations.  Social  entropy,  inspired  by  Shannon’s  information 
entropy  [17],  has  been  used  in  multi-agent  systems  [1]  as 
a  metric  for  diversity  in  behavior  or  properties  of  agents, 
including  diversity  in  their  spatial  locations.  This  metric 
captures  an  important  feature  of  diversity,  which  is  the 
number  of  differentiable  groups,  i.e.,  clusters,  in  a  system 
and  the  size  of  these  clusters. 

We  define  the  cluster  as  a  group  of  agents  in  which  the 
distance  between  the  agent  and  its  closest  neighbor  from  the 
same  cluster  is  smaller  than  some  threshold  value  h,  which 
is  the  communication  range  distance.  If  an  agent  does  not 
have  the  neighbor  within  a  distance  smaller  that  h,  then  this 
agent  is  alone  in  the  cluster. 

If  a  group  of  n  agents,  represented  by  the  set  R,  can  be 
divided  into  m  clusters  C\ ,  C2  ■  ■  ■  Cm,  such  that  there  is  no 
agent  which  is  simultaneously  in  two  clusters,  then  the  social 
entropy  H  of  the  system  R  is  given  by 

with  | .  |  representing  the  number  of  agents  in  the  cluster  and 
Si  |Ci|  =  |-R|>  where  |f?|  =  n  is  the  number  of  robots  in 
the  group. 

The  social  entropy  H  obviously  depends  on  the  threshold 
h.  In  order  to  develop  a  measure  that  can  accurately  represent 
the  diversity  in  a  population  of  agents  regardless  of  the  scale 
or  the  value  h,  hierarchical  social  entropy  has  been  used  in 
literature  [1],  which  is  given  by: 

r-OC 

E(R)  =  /  H(R,  h)dh  (25) 

A=o 

where  H(R1  h)  is  the  social  entropy  given  by  equation  (24). 

To  illustrate  flocking  in  the  absence  and  the  presence  of  the 
random  force,  we  simulate  a  group  of  20  agents  during  a  500 
sec.  time  span.  In  our  simulations,  the  agents  have  a  limited 
communication  range,  which  is  modeled  with  the  artificial 
potential  function  parameter  d\  =  20 m  (see  equation  (8)). 
The  other  parameter  of  this  function  is  do  =  6to,  and  the 
intensity  of  the  random  process  <7  =  0.25.  In  the  simulation 
scenario,  the  agents  are  allowed  to  move  only  in  the  100  x  100 
rectangular  area  (see  Fig.  2).  To  define  this  area  inside  the 
simulation,  we  introduce  an  additional  artificial  potential 
function  resulting  in  short-range  repulsive  forces  making 
agents  move  away  from  the  area  border. 

Figure  2  shows  the  agent  configurations  at  different  points 
of  time  of  the  simulation  when  random  force  in  the  controller 
is  not  applied.  The  agents  start  from  an  initial  randomly 
chosen  position.  By  the  terminal  time  of  the  simulation 
T  =  500  sec,  the  agents  form  three  separate  clusters,  which 
are  the  consequence  of  the  limited  communication  range. 
In  the  next  figure.  Fig.  3,  we  illustrate  the  simulation  in 
which  the  agents  start  from  the  same  configuration  as  in 
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the  previous  example,  but  now  the  random  force  inside  the 
controller  is  applied.  We  note  that  by  the  terminal  simulation 
time  the  agents  form  a  single  cluster. 
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Fig.  2.  Configurations  of  agents  at  times  T=0  (top  left),  T=167  sec  (top 
right),  T=333  sec  (bottom  left)  and  T=500  sec  (bottom  right)  when  no 
random  motion  is  applied 
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Fig.  3.  Configurations  of  agents  at  times  T=0  (top  left),  T=167  sec  (top 
right),  T=333  sec  (bottom  left)  and  T=500  sec  (bottom  right)  when  random 
motion  is  applied 

The  hierarchical  social  entropy,  given  by  equation  (25),  for 
these  two  examples  is  plotted  against  the  simulation  time  in 
Fig.  4.  From  this  figure,  it  is  evident  that  the  application 
of  random  force  results  in  lower  entropy  values,  meaning 
that  agents  aggregate  closer  to  one  another  forming  bigger 
clusters. 

In  order  to  verify  that  the  introduction  of  random  motion 
does  lead  to  better  flocking  behavior  of  agents,  we  carried 
out  two  sets  of  100  simulation  runs.  Each  run  was  carried  out 
under  the  same  conditions  as  in  the  two  presented  examples. 
In  one  simulation  set,  the  random  force  term  of  the  controller 
is  applied  and  in  the  other,  it  is  not.  For  each  simulation 
run  we  computed  the  social  entropy  at  the  terminal  time 
T  =  500  sec.  The  distribution  of  the  computed  social 
entropy  values,  for  each  set  of  simulations,  is  presented  in 
the  form  of  the  cumulative  distribution  function  shown  in 
Fig.  5.  It  can  be  seen  that  the  increase  in  the  number  of 


Fig.  4.  Hierarchical  social  entropy  of  the  population  of  agents  for  the  case 
when  no  random  motion  is  applied  (left),  and  the  case  when  random  motion 
is  applied  (right) 


simulation  occurrences  for  the  case  when  random  motion 
is  not  applied  is  slower  than  that  of  the  case  when  random 
motion  is  applied.  This  is  because  of  the  fact  that  most  of  the 
simulations  with  random  motion  have  an  entropy  which  is 
smaller  as  compared  to  simulations  with  no  random  motion. 
A  smaller  entropy  indicates  a  smaller  (spatial)  diversity,  i.e., 
better  flocking  behavior. 

The  ultimate  evidence  that  the  application  of  the  random 
force  term  leads  to  a  better  flocking  behavior  is  the  histogram 
of  number  of  simulations  with  specific  number  of  clusters 
in  the  final  configuration  for  the  two  cases,  as  shown  in 
Fig.  6.  As  can  be  seen  from  the  figure,  a  lesser  number 
of  clusters  are  formed  for  that  case  when  random  motion 
is  applied  as  compared  to  the  case  when  random  motion 
is  not  applied.  The  average  number  of  clusters  for  the  case 
when  no  random  motion  is  applied  is  3.02,  while  the  average 
number  of  clusters  when  random  motion  is  applied  is  1.51. 
This  shows  a  marked  improvement  in  flocking  behavior  and 
the  formation  of  one  giant  cluster  as  compared  to  the  case 
when  no  random  motion  is  applied.  It  may  be  noted  that  the 
separate  clusters  formed  in  the  second  case  move  randomly 
in  a  confined  space.  This  random  motion  within  a  confined 
space  results  into  the  probability  of  clusters  finding  each 
other  approaching  unity  when  the  time  approaches  infinity. 


Fig.  5.  Simple  social  entropy  at  final  configuration  of  agents  for  100 
simulation  runs 
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Fig.  6.  Number  of  clusters  of  agents  at  final  configuration  for  100 
simulation  runs  when  no  random  motion  is  applied  (left)  and  when  random 
motion  is  applied  (right) 

VII.  Conclusion 

Inspired  by  the  role  of  random  forces  in  robustness  of 
self-organization  of  cells  and  other  biological  systems,  we 
investigated  the  possibility  of  designing  a  decentralized 
controller  for  a  swarm  of  agents  in  which  a  stochastic 
process  is  included.  We  considered  the  flocking  behavior  of  a 
swarm  and  described  the  previously  considered  deterministic 
Lyapunov  function  controller  design  based  on  the  artificial 
potential  of  interactions  among  agents.  The  deterministic 
design  may  not  lead  to  desired  swarm  behaviors  because 
of  the  existence  of  agent  configurations  in  which  the  total 
potential  of  swarm  interactions  has  local  minima. 

The  design  we  considered  is  an  extension  of  the  determin¬ 
istic  one.  We  used  the  same  Lyapunov  function.  However, 
due  to  the  introduced  stochastic  process,  the  Lyapunov  func¬ 
tion  is  also  a  stochastic  process.  Consequently,  this  controller 
provides  a  means  of  escaping  from  the  local  minima  of  the 
total  potential  of  the  swarm  interactions. 

The  intensity  of  the  included  stochastic  process  is  of  great 
significance.  If  its  intensity  is  small,  then  the  swarm  behavior 
is  the  same  as  if  controlled  by  the  deterministic  controller. 
If  the  intensity  is  high,  then  the  flocking  behavior  cannot 
be  established  because  the  swarm  can  even  escape  from  the 
region  around  the  global  minimum.  Only  if  the  intensity  of 
the  stochastic  process  lies  within  some  intermediate  range, 
does  the  swarm  escape  the  local  minima  and  randomly 
explores  configurations,  which  can  ultimately  lead  the  swarm 
towards  configurations  close  to  the  global  minimum  of  the 
total  potential  of  interactions.  The  drawback  of  this  design  is 
that  even  in  the  global  minimum  configuration,  the  stochastic 
process  forces  the  swarm  to  search  for  a  better  configuration. 
Tuning  the  intensity  of  the  stochastic  process  of  the  controller 
is  identical  to  consideration  that  appears  in  the  design  of 
the  realistic  model  of  cell  behavior,  or  to  the  problem  of 
providing  physiological  conditions  for  self-organization  in 
biological  cells. 

The  performance  of  the  proposed  controller  was  illus¬ 
trated  by  the  derivation  of  the  emergent  swarm  behavior 


and  an  extensive  simulation  study.The  performance  of  the 
controller  with  the  random  process  term  was  compared  to  the 
performance  of  the  deterministic  controller.  The  simulation 
results  suggest  that  the  inclusion  of  the  random  process  in 
the  controller  can  certainly  improve  the  performance  of  the 
robotic  swarm  in  achieving  the  flocking  behavior. 
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A. 2.3  Mobile  Multi-Robot  Control  in  Target  Search  and 
Retrieval 

The  following  paper  (on  the  next  7  pages)  was  co-authored  by  Guoxian  Zhang 
and  Devendra  P.  Garg,  and  appeared  in  the  Proceedings  of  the  2008  ASME 
Dynamic  Systems  and  Control  Conference  on  pages  677-683,  published  and 
presented  in  October  2008. 
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ABSTRACT 

In  this  paper,  the  design  of  a  controller  is  proposed  for  a 
multi-robot  target  search  and  retrieval  system.  Inspired  by  re¬ 
search  in  insect  foraging  and  swarm  robotics,  we  developed  a 
transition  mechanism  for  the  multi-robot  system.  Environmen¬ 
tal  information  and  task  performance  obtained  by  the  robot  sys¬ 
tem  are  used  to  adjust  individual  robot  ’s  parameters  and  guide 
environment  exploration.  The  proposed  control  system  is  appli¬ 
cable  in  the  solution  of  multi-target  problem  also  where  several 
robots  may  be  needed  to  cooperate  together  to  retrieve  a  large 
target.  Simulations  show  that  the  task  performance  improves 
significantly  with  the  proposed  method  by  sharing  information 
in  parameter  learning  and  environment  exploration. 

INTRODUCTION 

The  interest  in  developing  robots  that  can  cooperate  together 
to  perform  tasks  is  increasing,  since  some  tasks  may  be  diffi¬ 
cult,  inefficient,  or  impossible  for  a  single  robot  to  perform  [1], 
The  cooperative  intelligence  of  animals  can  be  seen  in  nature 
especially  in  social  insects,  such  as  ants  and  bees  [2],  These 
insects  cooperate  to  search  for  prey  and  retrieve  them  to  their 
nest  with  only  indirect  communication  and  local  sensing  ability 
among  members  and  the  environment.  Inspired  by  these  insects, 
a  group  of  simple  robots  may  use  cooperation  to  perform  target 
location  and  retrieval  [3-6],  The  research  on  multi -robot  forag¬ 
ing  has  a  variety  of  possible  applications  such  as  demining,  toxic 
odor  source  search,  and  treasure  hunting  [7],  where  robots’  co¬ 
operation  may  increase  the  system’s  performance. 

The  foraging  behavior  of  ants  was  studied  by  Holldobler  and 
Wilson  [8]  with  two  characteristics:  ants  randomly  explore  the 
environment  to  search  for  prey;  and  the  prey  is  dragged  by  in¬ 
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dividual  or  collective  retrieval  straight  to  the  nest  according  to 
its  weight.  Three  kinds  of  learning  foraging  (local  performance- 
based  reinforcement,  global  performance-based  reinforcement, 
and  local  shaped  reinforcement)  were  discussed  by  Balch  in  [9]. 
In  a  macro  view,  differential  equations  can  be  used  to  model  the 
flow  of  robot  state  changes  in  the  system  [10].  An  analytical  and 
spatial  swarm  robots  foraging  model  using  virtual  pheromone 
based  partial  differential  equations  was  discussed  by  Hammann 
and  Worn  in  [11],  Labella  and  Dorigo  [5]  attempted  to  improve 
the  performance  of  robot  foraging  by  optimizing  each  robot’s 
probability  of  changing  state  from  rest  to  search  under  its  previ¬ 
ous  performance  in  target  searching  and  retrieving.  Experience 
of  other  members  was  ignored. 

Liu  [3]  utilized  environmental  cues,  internal  cues,  and  so¬ 
cial  cues  to  change  each  robot’s  threshold  for  search  and  rest 
based  on  all  robots’  performance  in  foraging,  including  interfer¬ 
ence  between  robots  and  obstacles.  This  method  can  improve  the 
system’s  net  energy  income  compared  to  systems  without  adap¬ 
tion.  Net  energy  income  is  the  energy  value  of  the  total  retrieved 
targets  minus  total  system  energy  consumption.  However,  re¬ 
searchers  Ren  and  Williams  [12]  indicated  that  communication 
is  not  necessary  in  multiagent  learning  for  robot  foraging. 

Different  from  decentralized  control,  Ogras,  Dagci  and  Oz- 
guner  [13]  presented  a  hierarchical  control  architecture  in  coop¬ 
erative  control  for  multi-robot  target  search.  This  method  needs 
a  supervisor  in  the  system  to  decompose  the  goal  into  tasks  and 
delegate  them  to  robots.  Besides  foraging,  multi-robot  system  is 
also  studied  in  coverage  problem.  A  review  of  methods  on  robot 
coverage  was  given  by  Choset  in  [14].  Hazon  and  Kaminka  [15] 
showed  that  redundant  coverage  could  guarantee  better  perfor¬ 
mance  in  coverage  time  than  non-redundant  coverage.  Wagner, 
Lindenbaum,  and  Bruckstein  [16]  proposed  an  ant-robots  system 
using  smell  traces  that  gradually  vanish  with  time.  This  method 
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could  complete  the  traversal  even  when  some  of  the  members 
die  or  the  graph  changes.  In  some  applications,  such  as  treasure 
hunting  and  odor  source  search,  methods  in  foraging,  retrieval 
and  coverage  may  be  combined  to  improve  the  system  perfor¬ 
mance. 

In  some  tasks,  such  as  demining,  different  kinds  of  targets 
may  be  deployed  in  an  area  and  the  target  density  decreases  dur¬ 
ing  the  process.  In  this  paper,  we  develop  a  method  for  a  group 
of  robots  performing  multiple  target  search  and  retrieval  tasks. 
Targets  with  different  values  and  weights  are  initially  deployed 
in  an  unknown  environment.  The  robot  system  is  required  to 
search  in  the  unknown  environment  to  find  targets  and  try  to  re¬ 
trieve  them  to  the  nest  within  a  certain  time  period.  Only  local 
communication  among  robots  and  local  sensing  of  the  environ¬ 
ment  are  assumed  to  be  available  during  the  entire  task.  The 
effect  of  communication  among  robots  is  studied  and  used  to  ad¬ 
just  robots’  parameters  and  search  process. 

PROBLEM  FORMULATION 

This  paper  addresses  the  problem  of  generating  a  control 
strategy  for  a  robot  group  involved  in  target  search  and  retrieval 
tasks.  The  robots  are  initially  deployed  in  a  specific  area  called 
the  nest,  denoted  by  3\£,  in  an  area  SL.  All  of  the  robots  are 
assumed  to  have  the  same  properties.  For  example,  they  have 
limited  sensor  field  of  view  denoted  by  S,  and  load  carrying 
ability  wq.  The  workspace  is  populated  with  n  fixed  obstacles 
{$1 , . . . ,  $„}  C  SL,  and  m  targets  %,}  C  SI.  Besides  loca¬ 

tion,  each  target  %  has  two  other  properties,  value  v(-  and  weight 
Wi¬ 
ki  order  to  gather  the  targets  into  the  nest,  robots  need  to  first 
search  in  SI  to  locate  the  targets,  and  when  a  target  is  found,  the 
robot  may  choose,  according  to  the  target’s  weight,  to  pull  it  into 
the  nest  individually  or  ask  for  recruitment.  According  to  the  en¬ 
ergy  and  time  constraint,  there  may  not  be  enough  time  for  the 
robots  to  locate  and  retrieve  all  the  targets  in  SL  to  C\[.  The  prob¬ 
lem  then  changes  to  achieving  an  optimal  group  performance  in 
the  given  time  period  T .  The  group  performance  can  be  mea¬ 
sured  by  the  net  energy  gain  of  the  group.  When  a  target  %  is 
retrieved  to  the  energy  gain  CSf  of  the  retrieval  can  be  con¬ 
sidered  as  the  difference  between  the  target  value  v,  and  the  total 
energy  consumption  c;-  to  retrieve  this  target.  In  this  problem,  the 
energy  consumption  is  assumed  to  be  proportional  to  the  robots’ 
traveling  distance 

Ci  =  krkod  (1) 

where  d  is  the  retrieval  distance,  ko  is  the  coefficient  relating 
the  distance  and  energy  consumption  in  retrieval,  and  kr  is  the 
number  of  robots  retrieving  the  target.  Then  the  net  energy  gain 
can  be  represented  as 

Gq-.  =  Vi  -  krkQd  (2) 


Figure  1 .  TRANSITION  DIAGRAM  OF  THE  ROBOTS 


Besides  energy  consumption  in  the  retrieval  task,  the  robot 
system’s  energy  consumption  may  also  occur  in  robots  that  are 
searching,  recruiting,  and  even  resting.  Considering  the  summa¬ 
tion  of  the  consumption  above  as  cq,  and  if  I  denotes  the  set 
of  indices  of  targets  which  have  been  retrieved  successfully,  the 
total  net  energy  income  of  the  robot  system  which  can  be  consid¬ 
ered  as  the  performance  index  is 

G  =  T,e  jGqt  —  co  (3) 

The  final  object  of  this  problem  is  to  improve  the  value  of 
G  in  a  given  time  period  T  while  searching  the  proper  area  and 
retrieving  proper  targets. 

PROPOSED  APPROACH 

Inspired  by  the  observation  of  ants  foraging  and  considering 
the  time  constraint  in  our  problem,  we  assume  that  all  the  robots 
have  the  same  properties,  such  as  moving  speed,  load  carrying 
ability,  and  sensor  range.  The  energy  is  consumed  by  robots  in 
the  system  with  different  rates  according  to  their  states,  for  in¬ 
stance  searching  and  resting.  Since  in  this  problem  targets  are 
deployed  at  the  beginning,  each  robot  can  utilize  the  environ¬ 
mental  information  obtained  both  by  itself  and  other  members  to 
adjust  its  parameters  and  improve  the  search  efficiency.  With  fi¬ 
nite  state  machine  strategy  the  transition  diagram  of  the  foraging 
task  is  illustrated  in  Fig.  1.  The  control  of  the  foraging  task  is 
decomposed  into  several  states.  The  states  of  the  robot  include: 

Rest :  The  robot  stays  in  the  nest. 

Search :  The  robot  randomly  explores  the  environment  in  or¬ 
der  to  find  targets. 

Recruitment :  The  robot  moves  back  towards  the  nest  in  or¬ 
der  to  ask  for  help  from  other  members  to  retrieve  the  target  it 
has  found. 

Help:  The  robot  moves  to  the  located  target  with  some  other 
robots  in  order  to  retrieve  it  together. 

Retrieve:  The  robot  carries  the  target  towards  the  nest. 

The  arrows  in  Fig.  1  show  the  possible  transitions  between 
robot  states.  The  possibility  of  a  robot  to  change  to  another  state 
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Table  1 .  VARIABLES  IN  ROBOTS  STATES  TRANSITION  AND  CON¬ 
TROL 


Variable  Description 

N  total  number  of  robots  in  the  system 

Nr  number  of  robots  in  rest  state 

Ns  number  of  robots  in  search  state 

Nrt  number  of  robots  in  retrieve  state 

Nrc  number  of  robots  in  recruitment  state 

Nh  number  of  robots  in  help  state 

Er  energy  consumption  of  robot  in  rest  per  second 

Es  energy  consumption  of  robot  in  search  per  second 

Ert  energy  consumption  of  robot  in  retrieve  per  second 

Erc  energy  consumption  of  robot  in  recruitment  per  second 

Ei,  energy  consumption  of  robot  in  help  per  second 

X,  number  of  robots  flow  between  two  states  (i  =  1 , . . . ,  8) 


depends  on  the  performance  of  the  robot  system  and  the  environ¬ 
ment  information  it  has  obtained.  The  variables  in  this  control 
graph  are  shown  in  Tab.  1 

At  time  t,  the  change  of  robots  in  each  state  can  be  repre¬ 
sented  by  XjS  at  time  t,  for  example 

dNr(t)  =  -Xi(t)+X2(t)+X5(t)-Xs(t)  (4) 

If  the  indices  of  targets  retrieved  successfully  up  to  time  t  is 
denoted  by  the  set  I(t),  the  energy  gain  at  time  t  can  be  repre¬ 
sented  as: 


dG(t)  =  dXieI^Vi  —  dNr(t)Er  —  dNs(t)Es 

- dNh(t)Eh  -  dNrt{t)Eyt  -  dNrc(t)Erc  (5) 

The  second,  third,  and  fourth  terms  are  corresponding  to  the 
change  of  co  at  time  1  in  eq.  (3).  Nr,  and  Nrc  are  corresponding 
to  kr  in  eq.  (2).  Erl  and  Erc  are  corresponding  to  kodo  in  eq. 
(2),  where  do  represents  robot’s  traveling  distance  per  second. 
The  purpose  of  this  exercise  is  to  improve  G(T)  in  a  given  time 
period  [0.  T],  In  this  paper,  we  try  to  control  A,s  during  the  time 
period  [0,  T }  to  obtain  a  large  net  energy  income  for  the  entire 
time  period.  The  control  strategy  of  these  transition  variables  are 
as  follows: 

X\[t  y.  Define  a  threshold  T,  (t )  which  decreases  when  a  tar¬ 
get  is  retrieved  by  a  robot  or  other  members  and  increases  when 


a  robot  or  other  members  come  back  to  the  nest  with  no  target 
found.  When  the  robot  has  been  in  rest  state  not  less  than  Tr(t), 
it  changes  to  search  state. 

X2(t):  Define  a  threshold  Ts(t)  which  increases  when  a  target 
is  retrieved  by  a  robot  or  other  members  and  decreases  when  a 
robot  or  other  members  come  back  to  the  nest  with  no  target 
found.  When  the  robot  has  been  in  search  state  not  less  than 
Ts(t),  it  changes  to  a  temporary  homing  state  and  when  it  arrives 
at  the  nest  it  changes  to  rest  states.  Ts(t)  is  updated  for  a  robot 
when  it  is  about  to  leave  the  nest  to  search  and  is  kept  constant 
during  the  robot’s  entire  foraging  period. 

X^(t):  Define  a  threshold  Trc(t )  which  averages  the  energy 
gain  of  the  robot  system  over  time.  When  a  large  target  is  first 
found,  and  the  net  energy  gain  to  ask  teammates  to  retrieve  it  is 
higher  than  pTrc(t),  where  p  is  a  constant,  the  robot  changes  its 
state  to  recruitment. 

X4  (t):  When  a  robot  finds  a  target  which  is  movable  by  it 
alone,  it  changes  its  state  to  retrieve. 

Xs(t):  When  a  robot  successfully  retrieves  a  target  to  the 
nest,  it  changes  its  state  to  rest. 

X(,{t):  When  a  robot  in  recruitment  arrives  at  the  nest  and 
finds  enough  teammates,  it  changes  to  help  state.  Otherwise,  if 
there  are  not  sufficient  teammates  in  the  nest,  it  changes  to  a 
temporary  wait  state  until  the  required  number  of  teammates  is 
met. 

X~i (f):  When  a  robot  in  the  help  state  arrives  at  the  target,  it 
changes  to  a  temporary  wait  state,  and  then  changes  to  retrieve 
state  when  all  the  members  arrive. 

Xs(t):  When  a  robot  is  in  the  rest  state  and  is  asked  by  a 
robot  in  recruitment  state,  it  changes  to  a  temporary  wait  state. 
When  there  are  enough  members  it  changes  to  help  state. 

We  assume  that  the  communication  among  robots  can  only 
happen  when  they  are  in  the  nest.  By  this  assumption,  the  diver¬ 
sity  of  information  held  by  each  robot  is  small  and  may  make  the 
control  strategy  concise.  In  this  case,  the  parameters,  such  as  Tr, 
7'v  and  Trc,  for  each  robot  can  only  be  adjusted  when  it  is  in  the 
nest.  When  the  robot  is  foraging,  all  of  these  parameters  are  kept 
constant.  Parameters  for  different  robots  may  be  different  due  to 
their  different  time  to  leave  the  nest. 

The  obtained  environmental  information  can  also  be  used 
to  guide  the  robot’s  search  process.  First  the  searching  field  is 
equally  divided  into  n  regions  as  shown  in  Fig.  2.  We  assume 
that  each  robot  has  the  ability  to  remember  the  environment  it 
has  explored.  The  search  strategy  can  be  represented  as  follows: 

1 .  When  each  robot  arrives  at  the  nest  back  from  foraging, 
it  shares  its  exploration  experience  with  other  robots  in  the  nest. 

2.  When  a  robot  changes  to  search  state,  it  chooses  the  re¬ 
gion  which  has  been  explored  by  fewest  number  of  robots  based 
on  the  robot’s  learned  information.  Before  leaving  the  nest,  the 
robot  informs  all  the  robots  in  the  nest  the  region  it  has  chosen. 

3.  A  random  unknown  area  in  the  chosen  region  is  generated 
as  the  primary  goal,  and  the  robot  searches  towards  the  goal  area 
first.  If  no  target  is  found,  then  it  continues  its  random  search  in 
the  field. 
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Table  3.  ENERGY  CONSUMPTION  PER  STEP  IN  EACH  STATE 


Figure  2.  REGION  DIVISION  OF  THE  FIELD 


Table  2.  TYPICAL  TARGET  PROPERTIES 


Type 

1 

2 

3 

4 

5 

6 

7 

Weight 

1 

2 

2 

2 

4 

4 

4 

Value 

400 

400 

800 

1200 

800 

1600 

2400 

5.  When  searching,  the  robot  preferentially  covers  the  area 
which  has  not  been  explored  by  other  robots  according  to  its  ob¬ 
tained  information.  If  all  the  possible  moving  directions  are  oc¬ 
cupied  by  obstacles  or  other  robots,  the  robot  chooses  to  stay 
until  a  free  place  appears. 

The  performance  of  the  control  strategy  mentioned  above  is 
simulated  as  described  in  the  following  section. 


SIMULATION  EXAMPLE 

In  this  section,  a  simulation  system  is  developed  to  test  the 
method  proposed  above.  The  following  assumptions  are  made  in 
the  simulation. 

(1)  The  time  is  discretized  into  small  finite  steps. 

(2)  The  space  and  targets  are  decomposed  to  be  squares. 

(3)  Each  robot  is  assumed  to  take  the  shape  of  a  square  with 
the  ability  to  sense  the  eight  squares  adjacent  to  it  and  load  car¬ 
rying  ability  of  one  weight  unit. 

(4)  The  information  of  the  members,  such  as  their  success 
in  retrieval  and  area  already  explored,  can  be  transferred  among 
robots.  The  information  is  transferred  if  and  only  if  both  the 
sender  and  the  receiver  are  in  the  nest. 

(5)  The  robot  can  move  up,  down,  left  or  right  one  square  at 
each  time  step. 

(6)  Targets  are  assumed  to  occupy  one  square,  with  typical 
values  and  weights  specified  in  Tab.  2. 
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Figure  3.  AN  EXAMPLE  OF  SIMULATION  FIELD  REGION 


(7)  Energy  consumption  is  assumed  as  shown  in  Tab.  3 

The  simulation  field  is  shown  in  Fig.  3.  The  boundary  of  the 
nest  is  represented  by  wide  lines;  robots  are  shown  as  triangles; 
obstacles  are  shown  as  circles,  and  targets  are  shown  as  rectan¬ 
gles. 

The  control  for  each  robot  is  discretized  at  each  time  step. 
When  a  robot  reaches  the  nest,  it  may  communicate  with  other 
robots  in  the  nest  and  share  the  information  about  the  field  it  has 
explored  and  its  current  task  performance.  Every  robot  at  rest 
state  computes  its  Tr  based  on  the  updated  information,  and  if 
timeout,  it  changes  to  search  state  and  computes  its  Ts  and  TK  by 
the  obtained  information  before  leaving  the  nest.  A  control  flow 
diagram  is  shown  in  Fig.  4. 

An  example  of  the  simulation  field  at  200th  time  step  is 
shown  in  Fig.  5.  In  this  figure  we  can  see  that  one  target  is 
retrieved  by  four  robots  and  another  target  is  retrieved  by  a  sin¬ 
gle  robot. 

We  create  simulations  under  two  conditions  to  examine  the 
performance  of  our  method. 

1.  Robot  uses  shared  information  to  adjust  its  parameters 
and  guide  its  search  process. 

2.  Robot  uses  its  own  information  to  adjust  its  parameters 
and  guide  its  search  process. 

All  the  robots  are  assumed  to  be  located  in  the  nest  at  the  be¬ 
ginning.  At  the  first  step  all  of  them  change  to  search  state.  After 
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Figure  4.  FLOW  CHART  FOR  EACH  ROBOT 
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Figure  5.  AN  EXAMPLE  OF  SIMULATION  FIELD  REGION  AT  200TH 
TIME  STEP 


the  first  robot  comes  back  to  the  nest  from  foraging,  it  changes  to 
a  special  ’’queen”  state.  The  robot  at  ’’queen”  state  will  stay  in  the 
nest  for  the  entire  remaining  time  period  and  share  information 
with  a  robot  which  has  just  arrived  at  the  nest  or  is  about  to  leave 
the  nest.  This  way,  any  robot  returning  to  the  nest  may  have  ac¬ 
cess  to  the  most  updated  and  complete  information.  This  allows 
information  in  the  nest  to  be  consolidated.  Each  simulation  has  a 
total  of  1000  time  steps,  and  all  the  results  are  the  average  of  50 
runs.  Net  energy  gains  of  the  systems  with  10,  15  and  20  robots 
under  conditions  1  and  2  are  shown  in  Tab.  4. 

The  net  energy  gain  versus  time  for  a  total  robot  population 
of  15  is  shown  in  Fig.  6;  the  number  of  robots  in  search  state 
versus  time  is  shown  in  Fig.  7;  and  the  number  of  robots  in  the 
nest  versus  time  is  shown  in  Fig.  8. 

From  the  results  in  Tab.  4  and  Fig.  6,  we  can  see  that  the 
use  of  members’  information  on  the  environment  is  important  to 
assess  the  system  performance.  Without  sharing  information  the 


Table  4.  NET  ENERGY  GAIN  UNDER  DIFFERENT  CONDITIONS 


Condition 

1 

2 

£io 

6493 

568 

Eis 

5862 

-1265 

E20 

4914 

-2549 

Figure  6.  NET  ENERGY  GAIN  IN  MULTI-TARGETS  FORAGING 


system’s  net  energy  income  decreases  at  about  the  300th  step. 
The  possible  reason  is  that  without  knowing  other  members’  ex¬ 
ploration  information,  it  is  highly  possible  that  the  robot  will 
continue  to  search  into  a  place  which  has  been  explored  by  other 
robots.  This  would  waste  the  system  energy  and  time.  For  both 
conditions,  the  net  energy  gain  decreases  as  the  number  of  robots 
increases.  With  finite  targets  deployed  in  the  field,  the  higher  the 
number  of  robots,  the  higher  is  the  system’s  energy  consumption. 
Moreover,  the  system  with  higher  number  of  robots  has  a  higher 
probability  of  interference  among  robots,  targets,  and  obstacles. 
This  will  decrease  the  efficiency  of  foraging  and  retrieving.  Fig. 
7  and  Fig.  8  show  that  under  condition  1  fewer  robots  are  in 
search  state  most  of  the  time  and  more  robots  are  in  rest  or  wait 
state  in  the  nest.  These  results  are  consistent  with  those  shown 
in  Tab.  4.  From  Tab.  4,  since  the  net  energy  decreases  as 
the  number  of  robots  increases,  we  can  see  that  the  system  with 
15  robots  seems  to  have  more  robots  than  the  task  needs.  By 
communicating  and  sharing  the  information,  the  system  under 
condition  1  divides  robots  into  different  states  with  fewer  robots 
foraging  and  more  robots  in  the  nest,  which  achieves  a  lower  sys¬ 
tem  energy  consumption.  This  shows  the  system  under  condition 
1  has  a  better  adaption  to  the  environment. 


5 


Copyright  ©  2008  by  ASME 


Figure  7.  ROBOTS  IN  SEARCH 


Figure  8.  ROBOTS  IN  THE  NEST 


CONCLUSION 

In  this  paper,  we  have  presented  a  methodology  for  multi¬ 
robot  system  used  in  a  target  foraging  and  retrieval  problem.  Fi¬ 
nite  State  Machine  is  used  to  control  the  robot  behaviors.  In¬ 
formation  obtained  by  the  individual  robots  is  shared  in  the  nest 
and  is  utilized  to  adjust  each  robot’s  parameters  in  state  transi¬ 
tions.  Shared  information  is  also  used  to  create  a  strategy  which 
helps  the  robots  to  choose  a  search  region  and  guide  their  move¬ 
ment  during  foraging.  Simulation  results  show  that  communica¬ 
tion  of  individual  robot’s  experience  on  task  and  field  searching 
could  improve  the  system  net  energy  gain  up  to  10  times  in  cho¬ 
sen  conditions  compared  to  the  case  without  sharing  information 
and  keep  at  a  high  system  net  energy  gain  when  the  later  one 


decreases  significantly. 
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A. 2.4  Sensing  and  Estimation  on  a  Modular  Testbed  for 
Swarm  Robotics 

The  following  paper  (on  the  next  8  pages)  was  co-authored  by  Gregory  K.  Fricke, 
Devendra  P.  Garg,  and  Dcjan  Lj.  Milutinovic,  and  appeared  in  the  Proceedings 
of  the  2009  ASME  Dynamic  Systems  and  Control  Conference,  Volume  2,  on 
pages  195-202,  published  and  presented  in  October  2009. 
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ABSTRACT 

Collective  robotics  offers  the  promise  of  enhanced  perfor¬ 
mance  and  robustness  relative  to  that  of  individual  robots,  with 
decreased  cost  or  time-to-completion  for  certain  tasks.  Having 
many  degrees  of  freedom,  the  multi-robot  control  and  estima¬ 
tion  problems  are  challenging,  specifically  when  the  solutions 
require  a  great  amount  of  communication  among  the  robots. 
While  numerical  simulation  is  a  critical  tool  in  swarm  robotics 
research,  verification  of  obtained  results  under  a  physical  real¬ 
ization  of  the  swarm  is  far  from  routine.  Therefore,  we  have  de¬ 
veloped  and  used  a  sensor-integrated  testbed  for  the  validation  of 
cooperative-robotics  algorithms,  observation  of  swarm  behavior, 
and  measurement  of  system  performance. 


INTRODUCTION 

Conducting  experiments  in  swarm  robotics  is  challenging, 
time-consuming,  and  expensive,  leading  many  investigators  in 
the  field  to  extensively  rely  on  computer  simulations.  While 
simulation  is  indispensable  for  algorithm  development  and  per¬ 
formance  estimation,  the  physical  realization  of  a  swarm  system 
is  important  for  full  understanding  and  validation  of  swarm  be¬ 
havior.  Also,  physical  experiments  shed  light  on  the  nature  of 
uncertainty  sources  in  the  sensors,  actuators,  and  communication 
pathways  specific  to  mobile  robotic  systems  while  illustrating  the 
logistical  difficulties  involved  in  implementing  swarm  behavior. 

Robot  pose  estimation  is  critical  for  feedback  control  of 
robot  trajectory.  Methods  for  pose  estimation  and  robot  local¬ 
ization  have  been  extensively  studied,  utilizing  methods  such 
as  dead-reckoning  [1-6],  on-board  and  off-board  computer  vi¬ 
sion  [7,8],  environmental  sensing  [9,10],  collective  or  distributed 
localization  [11,  12],  and  simultaneous  localization  and  map¬ 


ping  (SLAM)  [13]  using  laser-scanning  systems  [14,  15],  on¬ 
board  cameras  [16],  RFID  tags  and  readers  [17, 18],  or  ultrasonic 
rangefinders  [10]. 

Pose  estimation  and  tracking  of  individual  robots  are  critical 
to  the  metrology  required  for  complete  evaluation  of  experimen¬ 
tal  results.  In  the  case  of  multi-robot  systems,  the  scalability  of 
the  metrology  system  is  a  critical  concern.  With  multiple  robots, 
reliance  on  advanced  vision-based  techniques  quickly  surpasses 
all  but  the  most  advanced  computational  systems,  imposing  the 
restriction  that  performance  data  must  be  evaluated  off-line  with¬ 
out  real-time  constraints. 


‘Address  all  correspondence  to  this  author. 


Figure  1 .  Khepera-ll  arena  of  the  RAMA  Lab. 
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The  modular  testbed  described  in  this  paper  has  been  devel¬ 
oped  at  the  Robotics  and  Manufacturing  Automation  (RAMA) 
Laboratory  at  Duke  University.  The  testbed,  shown  in  Fig.  1, 
utilizes  multiple  Khepera-II  robots  operating  within  a  defined 
swarm  arena  equipped  with  two  SICK  LMS400  LADAR  sensors 
and  an  overhead  Cognex  Insight  5400  industrial  machine -vision 
system.  Researchers  at  other  institutions  have  recently  described 
similar  multi-robot  testbeds,  e.g.,  see  [15, 19,20].  Our  focus  has 
been  to  maximize  utility,  maintain  scalability,  and  avoid  overly 
specialized  and  cost-prohibitive  equipment. 

The  paper  is  organized  as  follows.  First,  the  swarm  mem¬ 
bers,  LADAR  units,  and  vision  sensor  are  described.  Next,  the 
estimation  methods  developed  to  utilize  the  data  are  discussed, 
and  the  communication  methods  employed  are  presented.  The 
final  sections  are  devoted  to  the  results  of  simulated  experiments 
using  real  raw  data,  including  a  discussion  of  plans  for  future 
development. 

SWARM  MEMBERS 

The  swarm  members  in  use  are  Khepera-II  robots  built  by  K- 
Team,  shown  in  Fig.  2.  The  robots  are  non-holonomic,  featuring 
two-wheeled  differential-drive.  The  wheels  are  independently 
controllable  via  calls  to  supplied  position-  or  velocity-control 
functions,  and  their  positions  are  measured  by  wheel  encoders 
with  Y^mm  resolution.  According  to  Canudas  de  Wit’s  wheeled- 
robot  definitions  in  [21],  this  is  a  Type  (0,2)  robot.  The  Khepera- 
II  robot  bases  are  nominally  69 mm  in  diameter,  and  34 mm  tall 
(from  bottom  of  wheels  to  top  of  base  circuit  board).  The  wheels 
are  15 mm  in  diameter,  and  the  wheel  separation  is  52.5 mm  from 
center  to  center. 


Figure  2.  Khepera-II  with  top  turret  covered  by  white  paper  disc. 


The  robots  utilize  a  Motorola  32-bit  processor  and  Flash 
memory  for  storing  user-created  programs.  Each  robot  has  eight 
infrared  proximity  sensors,  a  modular  expansion  system  allow¬ 
ing  additional  functionality,  and  a  rechargeable  battery  system 
providing  roughly  one  hour  of  operation  on  full  charge. 

In  the  following  subsections,  we  provide  a  brief  description 
of  the  robot  components,  including  the  difficulties  in  using  them 


based  on  our  experience. 

Programming 

Implementing  custom  operational  programs  for  the  robots 
is  straightforward  when  utilizing  the  KTProject  library  of  func¬ 
tions  (included  with  the  purchase  of  Khepera-II  robots  or  via 
download  from  the  K-Team  website).  The  KTProject  IDE  in¬ 
cludes  several  sample  C  source  code  files  that  can  be  quickly 
compiled  and  downloaded  to  the  robots  via  the  wired  COM  link. 
The  robots  provide  a  user-flash  capability,  allowing  the  user  to 
burn  an  image  into  memory  after  download. 

Motion  Control 

The  library  of  functions  provided  in  the  KTProject  IDE  in¬ 
cludes  closed-loop  motor  control  functions.  The  inclusion  of 
such  functions  facilitates  simple  development  and  quick  integra¬ 
tion  of  user  functions,  albeit  with  some  shortcomings. 

While  the  supplied  wheel  velocity  PID  controller  delivers 
fairly  repeatable  results,  there  is  an  apparent  lack  of  an  integra¬ 
tor  anti-windup  mechanism.  Thus,  if  a  robot  encounters  an  ob¬ 
ject  that  impedes  its  commanded  motion,  the  torque  commands 
to  the  wheels  continue  to  increase  until  either  the  wheel-surface 
friction  is  overcome  or  the  robot  pushes  past  the  obstacle.  In  the 
latter  case,  the  robot  accelerates  to  velocities  much  greater  than 
the  maximum  values  set  in  the  controller.  This  issue  must  be 
overcome  with  custom  programming. 

The  position  control  is  quite  useful  for  commanding  open- 
loop  trajectories  with  simple  geometry.  It  uses  a  time-optimal  ve¬ 
locity  profile,  commanding  the  maximum  acceleration  until  max¬ 
imum  velocity  is  achieved,  followed  by  a  coast  phase,  and  con¬ 
cluding  with  maximum  deceleration  to  stop  at  the  commanded 
position.  This  time-optimal  servo-control  works  well  for  single- 
degree-of  freedom  actuators.  This  1-DOF  control  method  fails 
for  motion  of  a  body  whose  kinematics  couple  the  actuators. 
Since  the  target  position  is  expressed  as  a  specific  number  of 
encoder  counts  for  each  wheel,  it  is  easy  to  see  that  the  final  posi¬ 
tion  and  orientation  of  the  robot  may  vary  greatly  from  the  target 
position  and  orientation  if  the  rate  of  each  wheel  is  not  moni¬ 
tored  closely.  Small  errors  in  the  control  loops  cause  significant 
variations  in  robot  angular  rate,  i.e.,  changing  the  orientation  or 
heading. 

Sensing  Ability 

Each  robot  is  equipped  with  eight  infrared  proximity  sen¬ 
sors.  When  used  to  detect  non-absorptive  and  non-dissipative  ob¬ 
stacles,  the  sensors  perform  well  with  a  usable  range  of  roughly 
1 0-5 0mm.  The  minimum  detection  distance  does  not  adversely 
affect  the  utility  of  the  data,  as  the  sensors  are  set  back  in  the  body 
of  the  robot  roughly  10 mm.  The  steep  decay  of  the  negative- 
exponential  sensing  function  may  induce  large  errors  when  using 
the  proximity  sensors  for  distance  measurement.  Additionally, 
the  maximum  detection  distance  may  vary  widely  depending  on 
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the  incident  surface  and  ambient  light  level.  In  particular,  flat- 
black  surfaces  prove  nearly  undetectable  by  the  sensors.  As  such, 
these  sensors  are  most  effectively  used  to  detect  the  presence  of 
an  obstacle  rather  than  to  measure  the  distance  to  an  obstacle. 

The  sensor  data  is  easily  accessed  through  the  included  soft¬ 
ware  library.  The  difficulty  lies  in  using  the  data  when  the  objects 
in  the  environment  vary  in  reflectivity. 

Communication  modes 

The  modular  expansion  system  of  the  Khepera-II  robots  al¬ 
lows  the  capabilities  to  be  extended  via  modules  called  turrets, 
including  the  General  Purpose  I/O  turret  (GPIO),  the  Gripper 
Turret,  the  Radio  Turret  (RT)  and  the  High  Speed  Radio  Turret 
(HSRT).  Without  the  radio  turrets,  only  simple,  wired  two-way 
communication  with  typical  PC  hardware  is  possible.  The  com¬ 
munication  method  to  be  used  is  dictated  largely  by  the  specific 
experiment  being  conducted.  In  this  research,  we  consider  both 
robot-to-computer  and  robot-to-robot  communication. 

For  robot-to-robot  communication,  the  RT  must  be  used, 
as  the  HSRT  is  incapable  of  direct  communication  between 
robots.  This  limitation  of  the  HSRT  arises  due  to  the  host-client 
paradigm  of  the  Bluetooth  protocol.  In  a  decentralized  control 
environment,  a  centralized  communication  “switchboard”  would 
be  required  in  order  to  utilize  the  HSRT.  Additionally,  systems 
using  the  HSRT  are  limited  to  seven  members  (again  due  to  the 
Bluetooth  protocol). 

For  robot-to-computer  communication,  there  are  three  off- 
the-shelf  choices.  The  first,  cheapest,  easiest,  and  most  reliable 
choice  is  to  use  the  direct-wired  serial  port.  This  mode  of  com¬ 
munication  has  proved  to  be  reliable  both  for  executing  the  built- 
in  functions  of  the  robots  as  well  as  for  initiating  a  custom  ex¬ 
ecutable  and  receiving  status  information  from  the  robots.  For 
development,  the  wired  communication  is  the  best  choice.  How¬ 
ever,  the  physical  constraint  of  the  wires  eliminates  this  option  in 
a  multiple  robot  or  high-mobility  setting.  The  second  choice  is 
the  HSRT,  which  allows  relatively  high  data  transmission  rates 
(up  to  115200  baud).  The  final  choice  is  the  RT,  which  has 
proven  to  be  somewhat  less  than  optimal.  The  data  rate  is  lim¬ 
ited  to  only  9600  baud,  and  the  connection  is  very  susceptible 
to  radio  frequency  interference.  For  simple  commanding  of  the 
robots,  the  RT  has  proven  to  be  adequate,  but  establishing  re¬ 
liable  communication  under  a  custom  executable  has  not  been 
achieved.  The  RT  is  currently  in  use  for  all  of  the  experiments 
underway  at  the  RAMA  Lab,  but  in  the  future  either  the  HSRT 
or  a  custom  alternative  (such  as  the  custom  IR  communication 
turrets  developed  in  [19])  must  be  employed. 

SENSORS 

External  position  measurements  are  available  to  the  central 
supervisor  from  a  Cognex  Insight  5400  vision  system  and  a  pair 
of  SICK  LMS400  LADAR  rangefinders.  The  vision  system  is 
configured  to  return  arbitrarily-ordered  pairs  of  x-y  position  data 
for  a  pre-specified  number  of  detected  robots.  To  maximize  the 


sampling  frequency,  we  use  simple  blob  detection  based  on  a 
background  contrast  comparison  and  an  area  threshold.  In  order 
to  provide  ample  contrast  with  the  black-painted  arena,  identi¬ 
cal  discs  of  white  paper  are  affixed  to  the  uppermost  turret  of 
each  Khepera-II  robot  (see  Fig.  2).  Doing  so,  however,  nulli¬ 
fies  the  vision  system’s  ability  to  measure  the  orientation  of  the 
blobs,  as  there  is  no  asymmetry.  Additionally,  in  a  multi-robot 
setting  the  robots  are  indistinguishable  from  one  another,  requir¬ 
ing  a  limited-information  identification  method.  The  Cognex  vi¬ 
sion  system  provides  robust  and  accurate  positional  data,  but  if 
multiple  robots  are  present  in  the  field  of  view,  the  system  is  not 
able  to  uniquely  distinguish  them  from  one  another.  The  issue 
of  maintaining  track  of  individual  robots  in  order  to  properly  and 
consistently  apply  the  measurements  is  addressed  below. 


Range  measurements  across  the  arena  are  provided  by  two 
SICK  LMS400-1000  industrial  LADAR  Measurement  Systems, 
communicating  with  the  central  host  via  a  10MBit,  half-duplex 
Ethernet  (TCP/IP)  connection.  The  units  utilize  diode  lasers  and 
a  rotating  mirror  to  sweep  across  70°. 

The  LMS400  LADAR  unit  allows  the  user  to  configure  sev¬ 
eral  parameters  of  the  scan.  Scan  frequency  may  be  set  from 
360Hz-500Hz,  with  angular  resolution  ranging  from  0.1333°- 
1.0°.  Measured  distances  (at  each  angular  point)  are  reported 
from  5 00mm- 3000mm.  Range  resolution  is  1mm,  although  the 
reported  systematic  measurement  standard  error  is  4 mm  for  typ¬ 
ical  remission  values  of  40-100%,  reaching  as  high  as  10mm  for 
lower  remission  values.  Remission  values  below  6.5%  lead  to 
invalid  measurements,  reported  explicitly  as  0mm.  To  facilitate 
optimal  data  collection,  range,  edge,  median,  and  mean  filters 
may  be  set.  In  the  experiments  conducted,  these  capabilites  have 
not  been  exploited;  all  such  parameters  were  left  at  factory  de¬ 
fault  values. 

Two  LADAR  units  are  utilized  in  the  RAMA  Lab  swarm 
arena  for  three  basic  reasons.  The  first  two  are  the  minimum 
range  and  limited  detection  angle  of  the  LADAR  units.  Utilizing 
two  units  at  opposite  ends  of  the  arena  ensures  complete  cov¬ 
erage  of  the  entire  arena  by  at  least  one  sensor  (see  Figure  3). 
The  third  reason  is  the  susceptibility  to  line-of-sight  obstruction; 
that  is,  one  robot  may  occlude  one  (or  more)  other  robot(s)  lying 
within  the  wedge  defined  by  the  tangents  from  the  first  robot  to 
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Figure  4.  Partial  and  full  occlusion  of  distant  robots  by  a  proximal  robot. 
Image  is  approximately  to  scale. 


variable,  features  of  interest.  For  perceiving  specific  shapes,  ap¬ 
propriate  wavelets  must  be  used. 


Figure  5.  Geometric  relationships  for  arc  detection  in  LADAR  data.  Rk 
is  the  radius  of  the  Khepera-ll  robot.  The  angular  extent  of  the  robot  at 
range  OF  is  2©. 


the  center-point  of  the  LADAR  unit  (see  Figure  4). 

Each  of  the  two  SICK  LMS400  LADAR  sensors  returns  a 
set  of  range  data  covering  the  70°  “wedge”,  discretized  into  0.25° 
samples.  At  each  angular  point,  a  range  measurement  is  returned. 
This  data  is  simply  a  sweep  over  the  field  of  view  of  the  sensor, 
so  the  positions  of  the  robots  within  the  field  of  view  must  be 
extracted.  This  is  discussed  later  in  the  Arc  Detection  subsection 
of  the  ESTIMATION  METHODS  section. 

ESTIMATION  METHODS 

Due  to  the  nearly  uniform  cylindrical  shape  of  the  Khepera- 
II  robots,  it  is  not  possible  to  directly  extract  the  orientation 
from  the  range  data  returned  by  the  LADAR  sensors.  The  vision 
system  has  the  capability  of  returning  orientation  data  provided 
there  are  distinguishable  features  on  the  robot  when  viewed  from 
above;  however,  the  orientation  measurements  require  significant 
additional  processing  time  in  a  scalable  problem  such  as  swarm 
or  cooperative  robotics,  and  are  thus  disabled. 

Reliance  on  visual  pattern  recognition  for  extracting  orien¬ 
tation  and  identity  information  is  used  by  other  research  groups, 
e.g.,  see  [19,22],  as  well  as  in  earlier  experiments  by  our  group 
[23],  Robot  orientation  or  pose  is  also  estimated  under  many 
SLAM  implementations,  e.g.,  see  [10, 15].  Our  approach  elim¬ 
inates  the  need  for  complex  machine-vision  software  and  pro¬ 
cessing  capabilities. 

Algorithms  have  been  developed  to  allow  data  collected  by 
the  vision  and  LADAR  sensors  to  be  used  for  experimental  data 
collection  and  performance  validation  (metrology),  as  well  as  for 
feedback  to  the  swarm. 


The  authors  have  developed  another  method  for  arc  detec¬ 
tion,  akin  to  a  correlation  function.  With  knowledge  of  the  geom¬ 
etry  of  the  Khepera-II  robots,  it  is  possible  to  predict  the  signal 
returned  by  the  LADAR  sensors  if  a  robot  is  in  its  field  of  view. 
Simple  trigonometry  defines  the  angular  extent  (20)  of  the  robot 
at  a  given  range  p,  as  illustrated  in  Fig.  5.  The  formulae  and  al¬ 
gorithm  for  calculating  the  predicted  arc  are  given  in  Eq.(l).  The 

predicted  arc  is  stored  to  a  vector  S  =  {5  n- i . So . Sn-\ } 

of  length  N.  The  incremental  angle  used  in  the  algorithm  is  A©, 
and  the  estimated  range  to  the  robot  at  angle  0,  is  Si.  Relating 
Fig.  5  to  Eq.(l):  p  =  OF,  t|/,  =  <| );  =  LOCA\,  Rk  =  CAj  = 

CF,  Si  =  04,. 


©  =  arcsin( 


Rk 

p  +Rk  ' 


,2©x 
N  =  fl°or(^)) 

So  =  p 
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Arc  Detection 

The  range  data  from  the  SICK  LADAR  units  is  a  raw  set  of 
points,  so  the  data  must  be  processed  to  extract  the  locations  of 
the  robots.  Traditional  methods  for  extracting  features  from  this 
sort  of  1-D  data  include  wavelet  transforms  [24]  and  the  Circu¬ 
lar  Hough  Transform  [25, 26].  Wavelets  can  be  used  quite  suc¬ 
cessfully  for  locating,  within  a  set  of  data  over  an  independent 


At  short  ranges  just  larger  than  sensor  minimum  range,  the  robots 
extend  over  approximately  5.46°,  which  corresponds  to  21  sam¬ 
ple  points  at  0.25°  angular  resolution.  At  the  maximum  extent  of 
the  arena,  the  robots  exhibit  an  angular  extent  of  approximately 
1.8°,  extending  over  only  7  sample  points. 

Using  these  equations,  a  predicted  arc  is  created  for  every 
point  in  the  raw  data.  The  raw  data  surrounding  the  current 
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point  is  truncated  to  the  same  length  as  the  predicted  arc,  i.e., 
N  in  Eq.(l).  These  two  vectors  are  unit-normalized,  and  their 
inner-product  is  evaluated.  The  resulting  scalar  is  then  multi¬ 
plied  by  a  confidence  factor;  this  final  product  is  the  score  for 
that  range  point.  Empirically,  we  have  found  that  N  is  a  very 
good  measure  of  confidence,  giving  strong  confidence  at  short 
ranges  (with  many  points  to  fit  to  the  arc)  and  weak  confidence 
at  longer  ranges  (with  few  points  to  fit). 
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range  point 

Figure  6.  A  sample  score  vector  with  five  robots  in  the  field  of  view.  The 
right-most  peak  is  incorrectly  marked  as  a  match  since  the  algorithm  was 
configured  to  find  six  robots. 


Upon  completion  of  the  point-by-point  scoring  described 
above,  the  top  scores  are  selected.  The  score  data  is  clearly 
of  the  same  length  as  the  range  data,  in  our  case  280  points  (a 
sample  set  of  score  data  is  shown  in  Fig.  6).  The  arc  of  a  sin¬ 
gle  robot  clearly  extends  over  N  points  (N  appropriate  for  that 
range).  Thus  the  maximum  values  in  the  score  data  must  be  sepa¬ 
rated  by  the  appropriate  number  of  points.  When  combined  with 
a  smoothing  algorithm  this  method  has  proved  to  be,  although 
not  perfect,  quite  robust  at  identifying  robots  within  the  LADAR 
field  of  view. 

This  algorithm  requires  the  number  of  robots  to  be  known, 
so  in  cases  where  there  are  fewer  robots  in  the  field  of  view  (due 
to  range  limits  or  occlusion  as  discussed  above)  false  matches  are 
guaranteed.  The  handling  of  these  situations  is  discussed  in  the 
Tracking  subsection  below. 

2nd  Order  KF 

The  central  supervisor  utilizes  a  2nd-order  Kalman  Filter 
[27, 28]  to  maintain  the  four  estimated  states  of  each  robot.  In 
addition  to  estimating  the  x-y  position,  the  filter  estimates  the 
velocity  and  orientation  angle  of  each  robot. 

The  robots  are  non-holonomic,  thus  the  kinematic  model  can 
be  exploited  to  extract  the  latent  variable  0,  the  orientation  an¬ 
gle,  without  knowledge  of  the  control  input.  A  novel  addition 
to  the  typical  extended  Kalman  Filter  is  the  variation  of  the  pro¬ 
cess  noise  intensity  of  state  0  (the  robot  orientation)  as  a  function 


of  another  state  v  (robot  linear  velocity).  The  state  and  measure¬ 
ment  models  are  briefly  shown  below  in  Eqs.(2)-(3).  The  discrete 
model  we  use  in  the  2nd-order  Kalman  filter  is  defined  in  greater 
detail,  and  results  are  provided  and  discussed  in  [29], 
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In  the  constant-noise  model,  the  control  variables  v  and  0  are 
modeled  with  zero-mean  white  noises  ^v(f)  and  ^e(f),  respec¬ 
tively.  In  the  varying-noise  model,  ^e(f)  is  treated  as  a  function 
of  state  v,  the  linear  velocity.  This  varying-noise  model  is  dis¬ 
cussed  in  depth  in  [23,29].  Measurements  provided  by  the  vision 
system  and/or  LADAR  arc-detection  algorithm  are  absolute  po¬ 
sition  measurements  (xm  and  ym)  assumed  to  include  zero-mean 
Gaussian-distributed  noise  wx  ~  N(0,Wx)  and  wy  ~  N(0,Wy). 
This  measurement  model  is  defined  by  Eq.  (3). 

The  prediction  is  performed  at  each  time  step,  initially  up¬ 
dating  states  before  applying  the  correction.  These  estimates  are 
then  propagated  and  updated  in  typical  form.  Without  measure¬ 
ments,  the  system  propagates  only  via  the  nonlinear  state  transi¬ 
tion  function.  In  update  steps  where  measurements  are  available 
and  valid,  the  innovation  is  performed,  utilizing  the  2nd  order 
Taylor  expansion  of  the  dynamic  model  in  Eq.(2).  (The  inter¬ 
ested  reader  is  directed  to  references  [27-29]  for  further  details.) 


Tracking 

When  multiple  robots  are  present  in  the  field-of-regard  for 
the  LADAR  and  vision  systems,  the  order  of  measurements  may 
be  rearranged  from  sample  to  sample.  Additionally,  there  may  be 
occasional  excessive  error  in  the  measurements  due  to  a  sensor 
glitch  or  communication  error,  or  false  identifications  returned 
either  by  the  arc-detection  described  above  or  by  the  vision  sys¬ 
tem.  A  matching  and  validation  algorithm  has  been  developed  to 
detect  and  correct  these  occurrences  to  the  extent  possible. 

The  matching  portion  of  the  algorithm  is  based  on  the  Hun¬ 
garian  Algorithm,  first  identified  by  Kuhn  [30]  and  further  devel¬ 
oped  by  Munkres  [31].  The  algorithm  solves  the  minimum-cost 
assignment  problem.  We  treat  the  error  between  a  measurement 
and  an  estimate  as  the  cost,  and  the  algorithm  selects  the  order 
that  minimizes  the  error. 

If  one  or  more  of  the  measurements  exhibits  error  that  is  too 
large  due  to  background  clutter  or  other  error,  those  measure- 
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ments  would  still  be  applied  erroneously  in  the  estimation  algo¬ 
rithm.  The  method  for  data  validation  is  adapted  from  Leonard’s 
geometric  beacon  tracking  method  of  localization  [9]. 

At  each  prediction  step  k  of  the  central  supervisor’s  EKF,  a 
validation  gate  g  is  established  around  the  estimated  output  of 
the  ;-th  robot,  Z'k+l  =  CX‘k.  The  range  of  the  validation  gate  is 
dependent  on  the  confidence  in  the  prediction,  i.e.,  the  predicted 
error  covariance  of  the  estimate,  S.  At  time-step  k,  the  predicted 
state  for  the  /-th  robot  is  compared  to  the  /-th  measurement,  Zj;. 


Vij=zl-Z> 

g2>VijSrjX 


(4) 


Leonard  refers  to  the  size  of  gate  g  as  a  “number  of  stan¬ 
dard  deviations”  of  acceptable  error.  Bar-Shalom  [32]  notes  in¬ 
stead  that  the  confidence  region  defined  by  g  is  %2 -distributed, 
and  thus  the  dimension  of  the  measurement  vector  must  be  con¬ 
sidered.  For  our  2-dimensional  measurement  vector,  a  one-sided 
confidence  region  of  68.2%  (roughly  equivalent  to  one  standard 
deviation)  is  given  by 

g2=Xil 0.318]  «  2.29 
«wL51 

For  the  matching  to  be  successful,  the  Kalman  Filter  de¬ 
scribed  above  must  maintain  estimates  within  a  certain  error 
bound.  Experiments  and  simulations  at  the  RAMA  Lab  continue 
in  the  search  for  the  appropriate  relationships  between  accept¬ 
able  error  within  the  KF  estimate  and  the  appropriate  value  for  g 
that  is  not  too  restrictive.  The  lowest  value  found  to  work  with 
the  real  data,  described  below,  which  provides  rejection  of  bad 
data  but  allows  tracking  is  g  «  2,  corresponding  to  a  confidence 
region  of  86.5%. 

Metrology 

In  a  swarm  experiment,  external  observation  of  the  entire 
swarm  is  desirable  to  quantify  the  performance  of  the  algorithm 
as  well  as  to  verify  the  completion  of  the  task.  For  purposes  such 
as  this,  it  is  especially  beneficial  to  use  a  sensor  (or  set  of  sen¬ 
sors)  whose  observational  range  extends  over  the  entire  region  of 
interest  such  that  all  robots  may  be  observed  simultaneously. 

Under  these  considerations,  the  vision  system  implemented 
here  is  ideal.  Its  field  of  view  extends  beyond  the  bounds  of  the 
swarm  arena,  and  none  of  the  robots  is  ever  occluded.  Addition¬ 
ally,  the  camera  provides  accurate  position  measurement  (a  « 
0.03 mm)  at  reasonable  sampling  rate  («  1.5Hz  for  6  robots). 

The  major  limitation  of  the  vision  system  is  in  the  scalabil¬ 
ity:  the  more  robots  are  present  in  the  field,  the  longer  the  sample 
period  becomes  as  the  system  processes  the  images.  The  use  of 
blob  detection  ensures  that  the  sample  period  is  minimized. 

The  LADAR  system  offers  a  trade-off,  providing  a  higher 
sample  rate  that  is  independent  of  the  number  of  robots  in  the 


field  as  long  as  the  communication  path  is  not  a  bottleneck.  As 
this  data  is  processed  off-sensor,  the  sensor  sample  rate  is  max¬ 
imized.  The  sacrifices  to  achieve  this  data  rate  are  the  limited 
and/or  occluded  field  of  view,  requiring  two  (or  more)  sensors  to 
be  used  for  full  arena  coverage,  and  the  increased  sensor  noise 
(4mm  standard  error  versus  0.03 mm  for  the  vision  system). 


Feedback 

In  the  feedback  setting,  the  timeliness  of  data  samples  and 
associated  processing  time  becomes  critical.  We  are  currently 
performing  experiments  in  which  position  data  from  the  external 
sensors  is  provided  to  the  robots  at  a  reduced  data  rate.  Using 
an  odometry  calibration  method  based  on  Larsen’s  and  Marti- 
noli’s  augmented  Kalman  Filter  method  [3,5],  simulations  have 
shown  that  the  robots  require  very  low  update  rates  to  maintain 
localization  accuracy.  Experiments  are  underway  to  verify  the 
performance  in  our  real  environment. 


RESULTS 

Data  was  collected  from  the  Cognex  InSight  5400  vision 
system  with  a  single  robot  moving  under  the  Braitenberg  ob¬ 
stacle  avoidance  algorithm  built  in  to  the  Khepera-II  standard 
library.  The  implementation  is  a  variant  of  Braitenberg’s  Ve¬ 
hicle  3b,  called  Like  [33].  The  resulting  trajectories  are  semi- 
deterministic  and  depend  strongly  on  the  initial  pose  relative 
to  the  arena  geometry  as  well  as  the  variation  among  the  on¬ 
board  IR  proximity  sensors.  These  data  sets  were  gathered  us¬ 
ing  a  distinct  identifier  disc  on  the  robot  displaying  a  spatial  pat¬ 
tern  such  that  the  vision  system  can  make  direct  measurements 
of  orientation.  This  was  desired  to  have  a  truth  measurement 
for  comparison  with  the  estimated  angle  from  the  modified  2nd 
order  Kalman  Filter.  The  trajectories  are  plotted  in  Figure  7. 
The  figures  exclude  measurements  that  are  invalid  due  to  mis- 
identification  or  failed  pattern-matching.  Those  bad  points  are 
identified  in  the  algorithm  and  are  treated  as  ‘skipped’  measure¬ 
ments. 

The  orientation  of  the  robots  in  these  trajectories  was  es¬ 
timated  under  the  model  described  above  using  both  a  state- 
varying  noise  model  and  a  constant  noise  model.  The  results 
are  given  in  Table  1 . 


constant 

varying 

%better  (worse) 

Set  1 

0.1382 

0.1475 

(6.75) 

Set  2 

0.2212 

0.2030 

8.23 

Set  4 

0.1276 

0.1389 

(8.85) 

Set  5 

0.2155 

0.2005 

7.00 

Table  1.  Estimation  scores  (RMS  errors)  for  the  constant-noise  and 
varying-noise  models,  expressed  in  radians. 
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Trajectory  1 


Trajectory  2 


Figure  7.  Four  of  the  five  measured  trajectories. 


The  relative  performance  of  the  varying-noise  model  com¬ 
pared  to  the  constant-noise  model  was  evaluated  by  comparing 
the  mean  squared  error  between  the  measured  and  estimated  an¬ 
gle,  i.e.,  y  Lt  (mod(0^  —  §/.,  27t))2.  The  modulo  function  is  crit¬ 
ical  to  account  for  angle-wrapping,  i.e.,  the  fact  that  0  =  0  +  2k. 


CONCLUSION 

A  functional  arena  for  experiments  in  swarm  robotics  has 
been  established.  Sensing  and  estimation  capabilities  have  been 
implemented,  providing  good  metrology  results  as  well  as  the 
ability  to  provide  online  feedback  to  swarm  members.  Prelim¬ 
inary  experiments  have  shown  that  this  testbed  will  provide  a 
good  basis  for  future  swarm  research  and  experiments  at  the 
RAMA  Lab. 

Future  development  plans  include  improved  communica¬ 
tion  ability.  Under  consideration  is  the  use  of  both  RTs  and 
HSRTs,  for  robot-robot  and  robot-supervisor  communication  re¬ 
spectively.  Alternatively,  GumStix  small-form-factor  computers 
may  be  used  alongside  the  HSRT  (using  Bluetooth  communica¬ 
tion)  to  allow  robot-to-robot  communication.  This  option  allows 
the  inclusion  in  our  facility  of  other  robots  with  Bluetooth  com- 


9vs 

traj=1 ,  Qv=6. 0.14753 


Figure  8.  A  sub-sample  of  trajectory  set  1  comparing  Qtruth  to  0  for  the 
varying-noise  model. 

munication  ability,  such  as  the  iRobot  Create. 

This  testbed  and  the  estimation  methods  we  have  developed 
present  us  with  opportunities  for  real-world  validation.  The  es¬ 
timation  algorithms  have  been  developed  to  be  independent  of 
the  specific  sensors.  The  arc -detection  method  may  be  applied 
to  other  sets  of  range-data  and  other  shapes  to  be  detected.  The 
orientation  estimation  may  be  applied  to  any  sensor  that  provides 
2D  position  data  (e.g.,  vision,  GPS,  RFID  tag/reader,  ultrasonic, 
etc.),  maximizing  our  choices. 
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A. 2. 5  Robotic  Pose  Estimation  via  an  Adaptive  Kalman 
Filter  using  State- Varying  Noise 

The  following  paper  (on  the  next  7  pages)  was  co-authored  by  Gregory  K.  Fricke, 
Dejan  Lj.  Milutinovic,  and  Devendra  P.  Garg,  and  appeared  in  the  Proceed¬ 
ings  of  the  2009  IASTED  Robotics  and  Automation  Conference,  published  and 
presented  in  November  2009. 
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ABSTRACT 

Swarm  robotics  offers  the  promise  of  enhanced  perfor¬ 
mance  and  robustness  relative  to  that  of  individual  robots, 
with  decreased  cost  or  time-to-completion  for  certain  tasks. 
Having  many  degrees  of  freedom,  the  swarm  related  con¬ 
trol  and  estimation  problems  are  quite  challenging,  specifi¬ 
cally  when  the  solutions  involve  a  large  amount  of  commu¬ 
nication  among  the  robots.  Under  certain  sensing  modali¬ 
ties,  direct  measurement  of  robot  orientation  may  either  be 
not  possible  or  require  excessive  sensor  processing.  In  this 
paper,  a  novel  method  is  presented  to  vary  process  noise  in¬ 
tensity  as  a  function  of  an  estimated  state  in  order  to  arrive 
at  the  hidden  state  robot  orientation.  Experimental  results 
are  provided,  demonstrating  the  efficacy  of  the  method  as 
well  as  the  error  reduction  relative  to  fixed-noise  estima¬ 
tion. 

KEY  WORDS 

Collective  robotics,  Pose  estimation,  Kalman  filter. 
Stochastic  estimation,  Mobile  robotics,  Differential-drive 

1  Introduction 

Robotic  pose  estimation  is  critical  for  proper  feedback  con¬ 
trol  of  robot  trajectory.  Pose  estimation  and  robot  localiza¬ 
tion  have  been  studied  extensively,  utilizing  methods  such 
as  dead-reckoning  [1-6],  on-board  and  off-board  computer 
vision  [7,  8],  environmental  sensing  [9-11],  collective  or 
distributed  localization  [12,  13],  and  simultaneous  local¬ 
ization  and  mapping  (SLAM)  [14]  using  laser-scanning 
systems  [15,  16],  on-board  cameras  [17],  or  ultrasonic 
rangefinders  [11],  Most  of  the  above  works  rely  on  some 
variant  of  the  Kalman  Filter  [18]  or  Extended  Kalman  Fil¬ 
ter  [19], 

Dead-reckoning  solutions  are  notoriously  susceptible 
to  accumulated  error  due  to  modeling  errors  or  unmod¬ 
eled  dynamics  such  as  wheel-slippage  or  uneven  surfaces. 
The  above  references  include  several  methods  for  odome- 
try  correction  or  calibration,  all  of  which  require  some  sec¬ 
ondary  form  of  sensing  for  feedback  correction. 

Computer  vision-based  localization  systems  gener¬ 


ally  come  in  two  flavors:  off-board  and  on-board.  Off- 
board  implementations  feature  either  high-speed  sensors 
with  high-complexity  pattern  recognition  and  tracking  soft¬ 
ware  (requiring  extensive  computational  capability  and  ad¬ 
vanced  machine-vision  techniques)  or  simpler  vision  sys¬ 
tems  reliant  on  known  configurations  or  patterns  on  the 
robot(s)  being  tracked  (see,  e.g.,  [20,  21]). 

On-board  solutions  rely  on  feature  tracking  in  the  en¬ 
vironment,  requiring  that  the  environment  contains  suffi¬ 
cient  features  to  identify  and  track.  Early  work  required 
a  priori  known  environmental  maps  or  beacon  locations 
[7,  9];  more  recent  SLAM  methods  build  the  environmen¬ 
tal  map  as  the  robot  moves  [14,  16].  Similar  environmen¬ 
tal  constraints  are  placed  on  laser-scanning-  and  ultrasonic 
rangefinder-based  SLAM  implementations.  Additionally, 
on-board  solutions  necessarily  place  significantly  higher 
sensing  and  computation  requirements  on  the  individual 
robots. 

1.1  Problem  formulation 

This  research  group,  performing  research  in  collective 
robotics  and  decentralized  formation  control,  seeks  to  per¬ 
form  algorithmic  development,  test,  and  evaluation  in  the 
presence  of  real-world  noise  with  minimal  equipment  over¬ 
head.  As  a  surrogate  for  significantly  more  complex  sens¬ 
ing  systems  on-board  the  robots,  measurements  of  the  sys¬ 
tem  will  be  made  by  external  position  sensors  and  made 
available  to  the  individual  robots  via  direct  or  broadcast 
wireless  communication.  Such  global  information  sharing 
may  be  considered  analogous  to  the  information  sharing 
utilized  in  [22],  If,  for  a  particular  experiment,  sensing 
or  communication  range  is  assumed  to  be  limited  for  indi¬ 
vidual  robots,  direct  central-computer-to-robot  communi¬ 
cation  could  be  configured  to  only  report  information  that 
the  robot  would  accordingly  have  access  to  if  the  sensing 
ability  were  on-board.  This  type  of  experimental  environ¬ 
ment  may  be  called  mixed-simulation  testing,  indicating 
a  combination  of  real  and  simulated  components.  Such 
testbeds  are  used  extensively  for  validation  of  complex  sys¬ 
tems  such  as  spacecraft  [23]. 
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The  robotic  agents  forming  the  basis  of  the  prelimi¬ 
nary  testbed  development  are  Khepera-II  differential  drive 
robots  from  K-Team.  External  observations  of  robots  in 
the  testbed  are  provided  by  two  SICK  LMS400  LADAR 
sensors  and  an  overhead  Cognex  Insight  5400  industrial 
machine-vision  system.  The  LADAR  sensors  provide  mea¬ 
surement  of  the  x  —  y  position  of  multiple  agents  within 
the  swarm  arena.  However,  due  to  the  nearly  uniform 
cylindrical  shape  of  the  Khepera-II  robots  when  viewed 
from  the  side,  it  is  not  possible  to  directly  measure  the 
agent  orientation  0  from  the  range  data  returned  by  the 
LADAR  sensors.  The  vision  system  has  the  capability  of 
returning  orientation  data  provided  there  are  distinguish¬ 
able  features  which  correlate  with  the  robot  orientation 
when  viewed  from  above.  Measuring  the  robot  orientation 
in  this  way  requires  significant  additional  processing  time, 
especially  in  the  case  of  scalable  problems  related  to  swarm 
and  cooperative  robotics. 

In  this  paper  we  present  an  alternative  approach  in 
which  the  orientation  of  the  agent  is  treated  as  a  hidden 
variable,  and  is  estimated  through  observation  of  the  robot 
motion.  The  dynamic  model  of  the  robot  (specifically, 
the  non-holonomicity)  is  then  exploited  to  appropriately 
vary  the  process  noise  covariance  used  in  the  extended 
Kalman  Lilter  (EKL).  Recent  papers  have  discussed  co- 
variance  variation  via  fuzzy  logic  and  with  dependence  on 
time-step  k  [1 1,  24];  the  authors  of  this  paper  are  not  aware 
of  prior  work  utilizing  state  estimates  to  vary  the  covari¬ 
ance. 


2  Model  Definition 


The  pose  of  an  individual  agent  in  a  swarm  is  defined  by  the 
coordinates  [x,  y,  0]T,  where  x  and  y  are  the  agent  coordi¬ 
nates  in  a  rectangular  reference  frame,  and  9  is  the  orienta¬ 
tion  angle  with  0  =  0  lying  on  the  ./.'-axis.  The  agent  pos¬ 
ture  changes  via  the  agent’s  on-board  control  loop,  which 
defines  the  agent’s  linear  acceleration  v  and  angular  veloc¬ 
ity  9.  The  dynamic  model  of  the  robot  used  in  this  paper  is 
given  in  state  space  form  in  Eqs.(l-3). 
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Knowledge  of  these  control  inputs  v  and  9  would 
greatly  decrease  the  error  in  the  estimation.  In  many  cases. 


such  as  the  one  described  in  this  paper  for  an  external  ob¬ 
server,  the  control  inputs  are  unknown.  Lor  this  reason, 
the  control  variables  v  and  9  are  modeled  with  the  zero- 
mean  white  noises  £„(f)  and  £g(f),  respectively.  Mea¬ 
surements  provided  by  the  vision  system  and/or  LADAR 
pair  are  absolute  position  measurements  xm  and  yrn  as¬ 
sumed  to  include  zero-mean  Gaussian-distributed  noise 
wx  ~  -/V(0,  Wx)  and  wy  ~  N( 0,  Wy).  This  measurement 
model  is  defined  in  Eq.(4). 
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We  utilize  stochastic  signals  as  a  general  model  of  the 
unknown  control  signals.  The  advantage  of  modeling  the 
unknown  control  signals  as  stochastic  signals  is  that  we  can 
exploit  Kalman  filter  theory  to  estimate  unknown  speed  v 
and  angle  9  based  only  on  the  position  measurements  xm 
and  ym. 

We  initially  assume  that  the  white  noise  signals  that 
model  the  control  variables  are  uncorrelated.  To  establish 
the  baseline,  we  also  assume  that  they  are  of  constant  in¬ 
tensity.  In  the  rest  of  this  paper  we  will  consider  the  pos¬ 
ture  estimation  based  on  this  constant  intensity  model  as 
compared  to  another  model  in  which  the  noise  intensity 
varies  as  a  function  of  estimated  robot  velocity. 

The  rationale  behind  the  velocity-varying  models  for 
estimation  lies  in  the  intuition  that  the  largest  changes  in 
orientation  are  likely  to  be  experienced  when  the  veloc¬ 
ity  v  is  small.  This  is  due  to  the  observation  that  for  a 
differential-drive  robot,  fast  orientation  changes  result  from 
large  differences  in  wheel  speeds.  Lor  such  a  system  with  a 
limited  dynamic  range  of  wheel  speeds,  the  maximum  rate 
of  orientation  change  will  occur  when  the  two  wheels  are 
each  driven  at  the  maximum  wheel  speed  but  in  opposite 
directions,  resulting  in  zero  linear  velocity.  At  the  other 
extreme,  maximum  linear  velocity  is  achieved  when  both 
wheels  are  driven  at  maximum  speed  in  the  same  direction, 
resulting  in  zero  angular  velocity.  Additionally  from  an  ob¬ 
servational  standpoint,  in  the  limit  as  velocity  goes  to  zero, 
changes  in  xm  and  yrn,  will  be  zero  or  undetectable  while 
the  orientation  may  change  in  range  [0, 2n\. 


2.1  Orientation  Estimation 

Our  proposed  method  is  based  on  the  2nd-order  Kalman 
filter  [19]  for  non-linear  systems.  Individual  agents  in 
the  swarm  are  characterized  by  the  state  vector  X  = 
[x,  y,  v,  9}T,  and  the  continuous  state  space  model  is  de¬ 
fined  by  Eqs.(l-3).  The  Kalman  filter  is  based  on  two  steps: 
the  prediction  and  the  update.  The  prediction  and  update 
step  equations  are  presented  below  without  derivation.  Ad¬ 
ditional  details  are  available  in  references  [18,  19]. 
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In  the  prediction  step,  the  computation  of  the  pre¬ 
dicted  value  X  of  the  state  vector  and  its  covariance  are 
based  on  the  last  estimation  X.  Since  we  are  dealing  with 
data  that  is  regularly  sampled  with  sampling  period  A t,  we 
use  the  discrete  time  version  of  the  2nd-order  filter.  The 
subscript  k  indicates  the  iteration  step;  the  predicted  state 
vector  at  the  k  +  1  sample  is  given  by 

Xk+i  =  E-^k  P  {Xk+i] 

=  Ejtk<p{Xk  +  F(Xk)At} 

where  the  expectation  is  in  regard  to  Gaussian  distribution 
with  the  mean  value  Xk  and  corresponding  covariance  P. 
Using  the  Taylor  Series  expansion  up  to  the  second-order 
terms  and  including  E-%  P{Xk  —  Xk)  =  0  the  above  ex¬ 
pression  can  be  rewritten  as 
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The  process-noise  covariance  matrix  Qk  consists  of  the  ve¬ 
locity  increment  (vk+i  —  vk )  ~  7V(0,  2J!)  and  the  incre¬ 
ment  of  the  angle  (6k+ i  —  0k)  ~  iV(0,  2®).  By  including 
the  control  inputs  for  v  and  9  as  noise,  the  magnitudes  of 
2J!  and  2®  are  likely  larger  than  would  otherwise  be  re¬ 
quired,  but  with  the  benefit  of  a  method  that  is  independent 
of  communication  with  the  robot. 

In  update  steps  (performed  when  measurements  are 
available)  we  correct  the  predicted  state  estimation  based 
on  the  measurements.  The  update  step  contains  the  follow¬ 
ing  computations. 


Xk+i  —  Xk  +  F(Xk)At+ 
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E  ^  ^  fa(Xk  ~  Xk)T Fi(Xk  -  Xk) 
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in  which  the  vectors  0j  are;  (f>i  =  [1  0  0  0]T,  <j> 2  = 
[0  1  0  0]T,  </>3  =  [0  0  1  0]T,  </)4  =  [0  0  0  1]T  (i.e.,  typi¬ 
cal  orthonormal  basis  vectors  of  ‘ft4)  and  the  matrix  F%  is 

[Fi)mu  -  fc) At,  m,n  =  1, 2, 3, 4.  (7) 


dxmdxn 


The  notation 


denotes  the  element  in  row  m  and 


column  n.  We  can  then  rewrite  Eq.(6)  as  shown  in  Eq.(8). 
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Xk+i=Xk+F(Xk)At+-Y/fatr{FPk}  (8) 

2=1 

The  square  matrix  denotes  the  covariance  matrix  of  the 


'{( Xk-Xk)T(Xk-Xk )}. 


state  estimate,  i.e.,  Pk  =  E  ■ 

From  Eq.(2)  and  Eq.(7),  we  obtain  F3  =  F4  =  0,  and  F\ 
and  F2  as  defined  in  Eqs.(9-10). 
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Based  on  Pk  we  can  also  predict  the  covariance  matrix  of 
the  state  estimate  Pk+ 1  via  Eq.(l  1). 

Pk+i  =  AkPkAk  +  Qk  (11) 

The  state-transition  matrix  Ak  is  defined  in  Eq.(12). 
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Sk+ 1  —  CPk+\CT  +  R 
Kk+i  =  Pk+i^S^ 

Xk+ 1  =  Xk+i  -(-  Kk+i{Zk+i  —  CXk+i) 
Pk+ 1  =  (I  ~  Kk+iC)Pk 


The  covariance  matrix  of  the  measurement  noise,  R,  is 
given  below. 
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2.2  Variation  of  Orientation  Covariance 

As  discussed  above,  it  is  likely  that  increments  of  9  will  be 
larger  as  v(t)  — t  0.  For  this  reason  instead  of  only  consid¬ 
ering  variance  2®.  constant,  we  also  consider  the  variance 
as  a  function  of  velocity  v,  i.e.,  2®  =  ’Erk(vk)  ■ 

In  Eq.(ll)  and  Eq.(13)  above,  the  process  noise  co- 
variance  matrix  Qk,  and  more  specifically  2®,  is  noted  to 
vary  from  step  to  step,  and  as  such  must  be  re-evaluated 
after  each  update.  Here  we  denote  the  dependence  of  [Q] 44 
on  v,  such  that  it  achieves  its  maximum  value  for  v  =  0  and 
decreases  monotonically  to  [Q]  44  =  0  at  v  =  v.  The  pro¬ 
posed  function  defining  the  relationship  between  2®  and  v, 
given  in  Eq.(16),  is  based  on  the  observed  empirical  his- 
togrammatic  correlation  between  v  and  A0  over  several 
data  sets. 

Eek(vk)  =  arccos2[(^)4]  (16) 

v 

This  function  requires  one  tuning  parameter,  v.  Af¬ 
ter  performing  several  optimization  iterations  using  the 
Matlab®Optimization  Toolbox,  we  observed  that  the  nor¬ 
malization  parameter  correlates  well  with  the  histogram- 
matic  peak  of  v  for  a  given  trajectory.  It  is  clear,  however, 
that  for  such  a  definition  of  v,  Vk  will  almost  certainly  ex¬ 
ceed  v  at  times,  in  which  case  the  arccos  function  returns 
non-real  values.  Protections  have  thus  been  implemented 
in  the  software  to  ensure  real  values.  The  practical  impli¬ 
cation  is  that  |y|  >  v  — >  2®  (vk)  =  0. 
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Figure  1:  Overhead  view  of  Khepera-II  arena.  Image 
taken  from  Cognex  Insight  5400  machine  vision  sensor, 
showing  several  Khepera-II  robots  with  unique  identifier 
IDs  (for  truth  measurements). 

3  Experimental  Data 

The  method  of  orientation  estimation  described  in  this  pa¬ 
per  has  been  implemented  in  the  Khepera-II  testbed  within 
our  experimental  laboratory  [25].  The  testbed,  shown  in 
Figure  1  and  Figure  6  has  been  constructed  for  the  purpose 
of  studying  and  validating  swarm-  and  cooperative-robotics 
control  methodologies  in  the  presence  of  true  sensor  noise, 
and  communication  uncertainties  and  limitations. 

Experimental  data  sets  with  the  Khepera-II  robots, 
SICK  LMS-400  LADAR  sensors,  and  the  Cognex  InSight 
5400  vision  system  have  been  collected  to  provide  real  in¬ 
puts  to  the  algorithms.  The  data  thus  includes  real-world 
corruption  such  as  dropped  or  misread  measurement  pack¬ 
ets,  non-real  values,  noise  due  to  environmental  conditions, 
and  sensor  noise. 

Data  was  gathered  via  the  Cognex  InSight  5400  vision 
system  with  a  single  robot  present  within  the  arena,  moving 
under  the  Braitenberg  obstacle  avoidance  algorithm  built  in 
to  the  Khepera-II  standard  library.  The  implementation  is 
a  variant  of  Braitenberg’s  Vehicle  3b,  also  called  Like  [26]. 
The  resulting  trajectories  are  non-deterministic  and  depend 
strongly  on  the  initial  orientation  relative  to  the  arena  ge¬ 
ometry,  as  well  as  on  the  variation  among  the  on-board  IR 
proximity  sensors.  Five  data  sets  were  collected  with  the 
robot  moving  in  this  manner.  One  of  these  sets  was  chosen 
arbitrarily  to  tune  the  parameter  v.  The  training  trajectory 
is  shown  in  Figure  2.  The  remaining  trajectories  are  plotted 
in  Figure  3. 

These  initial  data  sets  were  gathered  using  distinct 
identifier  discs  atop  the  robots  (as  visible  in  Figure  1)  such 
that  direct  measurement  of  robot  orientation  would  be  pos¬ 
sible  and  available  for  comparison.  These  sets  were  used  to 
validate  methods  without  the  concerns  of  processing  time 
for  the  vision  system.  These  data  sets  contain  only  data 
from  the  Cognex  vision  system.  Four  additional  data  sets 
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Figure  2:  The  robot  trajectory  used  to  find  the  optimal 
value  of  v.  The  figure  excludes  invalid  measurements, 
which  are  identified  in  the  algorithm  and  treated  as 
‘skipped’  measurements. 

were  taken  with  a  stationary  robot  in  different  locations  to 
quantify  the  sensor  noise  covariances  (i.e.,  measurement 
noise  R),  including  evaluation  of  whether  the  sensor  er¬ 
ror  is  a  function  of  location  within  the  arena.  For  the 
Cognex  vision  system,  location  does  not  significantly  af¬ 
fect  the  noise  covariance;  the  SICK  LADAR  sensor  noise 
does  vary  significantly  across  the  field  of  view.  Results  are 
given  below  for  the  data  collected  by  the  Cognex  vision 
system. 

4  Experimental  Results 

Simulations  conducted  with  the  above  described  data  sets 
as  inputs  allowed  the  tuning  of  model  parameters,  as  the  es¬ 
timated  states  could  be  directly  compared  to  the  measured 
states.  The  relationship  between  the  noise  intensity  E®,  and 
estimated  velocity  proposed  in  Eq.(16)  has  been  optimized. 
The  tuning  parameter  for  this  velocity-dependent  noise  pro¬ 
file  was  optimized  under  an  arbitrarily  chosen  trajectory  set 
from  the  single-robot  data.  For  a  fair  and  legitimate  com- 
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Figure  4:  candidate  function. 


Figure  3:  Four  of  the  five  measured  trajectories.  The 
figures  exclude  invalid  measurements,  which  are  identified 
in  the  algorithm  and  are  treated  as  ‘skipped’ 
measurements. 


parison,  the  fixed-noise  covariance  magnitude  was  also  op¬ 
timized  under  the  same  trajectory.  The  varying-noise  func¬ 
tion  is  plotted  in  Figure  4,  with  the  fixed-noise  value  also 
plotted  for  comparison.  As  seen  in  the  plot,  the  optimized 
parameter  is  v  =  756mm/s. 

The  relative  performance  of  the  varying-noise  model 
compared  to  the  constant-noise  model  was  evaluated  by 
comparing  the  mean  squared  error  between  the  measured 

and  estimated  angle,  i.e.,  (mod(0  —  9:  2tt))2.  The 

modulo  function  is  critical  to  account  for  angle- wrapping; 
i.e.,  to  account  for  the  fact  that  9  =  9+2n.  Table  1  gives  the 
scores  (RMS  error,  expressed  in  radians)  for  the  constant- 
noise  and  variable-noise  models  for  each  of  five  trajecto¬ 
ries.  The  greatest  score  improvement  (error  reduction)  is 
for  Set  3,  which  was  used  as  the  training  set  for  optimiz¬ 
ing  v.  Figure  5  illustrates  the  difference  between  measured 
and  estimated  orientation.  The  measured  data  is  real  and 
includes  noise,  evidenced  by  some  non-smooth  segments. 


5  Applicability 

This  methodology  can  be  applied  on  similar,  differential- 
driven  robot  system  utilizing  position-only  feedback.  Use 
of  this  method  has  the  potential  to  greatly  reduce  the  sens¬ 
ing  requirements  in  situations  where  orientation  informa¬ 
tion  is  desired.  For  example,  for  beacon-,  a-priori- known 
map-,  or  time-of-flight-based  navigation  systems  could  re¬ 
duce  the  number  of  beacons,  features,  transmitters,  etc., 
such  that  only  position  measurements  are  required. 

In  this  paper,  it  is  asserted  that  this  estimation  method 
is  possible  due  to  the  physical  constraints  of  a  differential- 
drive  vehicle.  However,  experiments  have  been  performed 
only  with  a  single  trajectory  control  method,  and  thus  this 
claim  remains  unproven.  Continuing  experiments  and  sim¬ 
ulations  seek  to  answer  the  question  of  whether  this  method 
is  applicable  under  other  trajectory  control  paradigms. 

These  experiments  have  focused  solely  on  estimation 
via  an  external  observer,  alleviating  high-throughput  com¬ 
munication  of  control  input  by  the  robot  to  the  observer. 
If  this  method  is  implemented  on-board  the  robot,  utiliz¬ 
ing  its  own  exteroceptive  sensing  ability  to  yield  x-y  mea¬ 
surements,  the  control  inputs  would  be  available  to  the  es¬ 
timation  algorithm  as  well,  thus  significantly  reducing  the 
required  magnitude  of  the  process  noise  injected  into  the 
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evs6est 

traj=1 .  Qv=6,  0.14753 


traj=3,  Qv=B,  0.18812 


Figure  5:  9iruth  vs.  9  for  varying-noise  model,  trajectory 
sets  1  and  3.  Zoomed  in  to  show  detail. 

model. 

This  extension  of  the  method  allows  the  potential  of 
greater  scalability  with  multiple  vehicle  systems  as  the 
sensing  and  processing  requirements  are  drastically  re¬ 
duced.  Furthermore,  applications  in  holonomic  estimation 
and  control  may  be  possible.  For  example,  a  quadrotor 
aerial  vehicle  with  6DOF  could  exploit  a  system  such  as 
this  if  the  control  input  is  known. 

6  Conclusion 

The  method  of  varying  the  estimated  process  noise  covari¬ 
ance  of  the  orientation  state  as  a  function  of  velocity  proved 
to  work  admirably.  The  errors  observed  under  the  varying- 
noise  model  were,  for  three  of  the  five  test  trajectories,  sig¬ 
nificantly  lower  than  for  the  constant-noise  model.  Clearly 
though,  two  of  the  trajectories  result  in  increased  estima¬ 
tion  error  with  the  varying-noise  model.  These  results  are 
compelling,  and  suggest  that  such  a  varying-noise  model, 
appropriately  configured,  may  be  able  to  reduce  estimation 
error.  We  continue  to  study  the  differences  in  the  trajecto¬ 
ries  in  an  attempt  to  identify  cases  where  use  of  a  noise- 


Figure  6:  View  of  the  Khepera-II  lab  space,  showing 
relative  locations  of  Cognex  Insight  5400  camera  and  dual 
SICK  LMS400  ladar  sensors. 


varying  model  will  be  most  beneficial.  We  suspect  that  the 
dynamic  ranges  of  the  states  are  critical  to  the  convergence 
of  the  estimator;  a  potential  method  of  evaluating  the  dy¬ 
namic  ranges  of  the  trajectories  has  been  inspired  by  the 
numerical  conditioning  suggested  by  Antonelli  [6], 
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A. 2.6  Discrimination  and  Tracking  of  Individual  Agents 
in  a  Swarm  of  Robots 

The  following  paper  (on  the  next  6  pages)  was  co-authored  by  Gregory  K. 
Fricke  and  Devendra  P.  Garg,  and  appeared  in  the  Proceedings  of  the  2010 
American  Control  Conference  on  pages  2742-2747,  published  and  presented  in 
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Abstract — Multi-agent  robotic  systems  utilizing  one-to-one 
communication  between  individuals  necessarily  require  that 
each  robot  has  a  unique  identity,  such  that  shared  information 
is  appropriately  utilized.  When  used  with  a  central  computer 
and  external  vision  system,  various  methods  may  be  used  to 
identify  individuals,  including  unique  patterns.  This  requires 
a  priori  knowledge  of  the  patterns,  while  taxing  the  vision 
system.  Such  methods  are  discussed  along  with  their  limitations. 
An  alternative  method  is  proposed  to  alleviate  the  strain  on 
the  vision  system,  eliminating  the  need  for  prior  knowledge  of 
individual  patterns. 

I.  INTRODUCTION 

In  multiple -robot  systems  utilizing  one-to-one  commu¬ 
nication,  robots  must  be  assigned  unique  identities  within 
the  system  to  facilitate  appropriate  message  transmission.  In 
many  cases,  robots  using  wireless  communication  will  have  a 
unique  identity  established  via  some  hardware  method,  such 
as  a  multi -bit  switch,  or  a  MAC  address. 

For  a  central  supervisor  to  provide  feedback  and  unique 
information  to  specific  members  of  a  swarm,  the  system 
must  be  able  to  identify  the  robots,  associating  a  known 
or  observed  client  ID  with  a  detected  and  tracked  agent. 
Such  implementations  may  be  used  in  applications  such  as 
soccer-playing,  area-surveillance  or  -defense,  with  the  central 
supervisor  or  controller  managing  such  tasks  as  mission 
planning,  mode-switching,  map  construction,  or  message 
switchboarding.  For  information  exchange,  the  identity  of 
both  sender  and  receiver  must  be  known. 

The  identification  process  may  be  addressed  through  the 
use  of  special  tracking  tools.  High-cost  vision  systems  uti¬ 
lizing  custom  fiducial  markers  on  each  robot  are  available, 
reliable,  and  capable  of  tracking  both  air-  and  ground-based 
robots  (see,  e.g.,  [1]).  It  is  the  goal  of  this  research  work  to 
find  a  method  requiring  little  or  no  specialized  equipment. 

A  method  is  presented  that  relies  on  data  reduction  to 
provide  information  for  discrimination.  The  method  thus 
reduces  to  an  estimation  not  unlike  a  least-squares  fit.  A 
complete  presentation  of  a  system  utilizing  such  a  method 
is  given,  including  the  dependencies  between  estimation 
components. 

II.  Identification  via  Visually  Discriminatory 
Information 

In  collective-robotic  applications  with  a  central  observer 
or  controller,  it  is  common  to  use  features  on  the  robots 


Fig.  1 .  Overhead  view  of  Khepera-II  arena  in  the  RAMA  Lab. 


to  provide  identity  information  via  features  detectable  by  a 
visual  sensor.  Some  examples  include  color/shape  patterns  on 
RoboCup  “players”  [2]  or  other  swarm  robots  [3],  2D  bit- 
patterns  on  unmanned  ground  vehicles  (UGVs)  in  a  swarm 
implementation  [4],  and  many  others. 

The  Robotics  and  Manufacturing  Automation  (RAMA) 
Laboratory  at  Duke  University  has  developed  a  modular 
testbed  for  conducting  multi-robot  experiments  [5],  using 
Khepera-II  robots,  a  Cognex  Insight  5400  real-time  indus¬ 
trial  machine  vision  system,  and  two  SICK  LMS400-1000 
LADAR  sensors. 

The  SICK  LADAR  sensors  provide  only  range  data,  and 
thus  the  agents  must  be  identified  as  features  in  the  data.  An 
arc-detection  method  developed  at  the  RAMA  Lab  locates 
agents  within  a  set  of  range  data  [5].  The  measurements 
of  the  extracted  agents  include  only  x-y  position  data,  as 
the  Khepera-II  robots  are  cylindrical  with  no  distinguishing 
features  to  be  used  to  measure  orientation. 

The  Cognex  system  is  comprised  of  the  camera,  which 
is  a  standalone  computer  performing  image  capture  and 
processing,  and  a  desktop  computer  for  control  of  the  camera. 
With  no  image  processing.  The  camera  provides  black  and 
white  images  live  video  at  up  to  40 fps  with  negligible  lag. 
Pixel  resolution  ranges  from  2A3mm/pix  to  2.75 mm/pix 
across  the  arena. The  camera  may  be  configured  to  perform 
certain  image-processing  tasks  such  as  identification  of  user- 
defined  patterns.  Fig.  1  gives  a  sample  image  of  the  arena. 


978-1  -4244-7427-1/1 0/$26.00  ©2010  AACC 


2742 


Fig.  2.  Unique  IDs  for  Cognex  identification  via  pattern-recognition 


1)  Nine-bit  Identifier:  In  order  to  generate  position  and 
orientation  data  along  with  unique  identification  using  this 
camera  system,  distinguishing  features  were  added  to  the 
robots.  Several  9-bit  patterns,  generated  on  2 in  diameter 
circles,  are  shown  in  Fig.  2.  This  feature  size,  exceeding 
5 pix,  was  thought  to  be  sufficient  to  guarantee  robustness  in 
pattern  matching. 

Baseline  tests  were  performed  with  ten  patterns  to  select 
a  few  patterns  that  would  work  together.  In  these  tests,  all 
of  the  pattern  discs  were  placed  in  the  arena,  closely  spaced. 
The  pattern-recognition  software  was  then  enabled  for  one 
pattern  at  a  time.  This  series  of  tests  was  performed  with  the 
patterns  grouped  in  two  regions  of  the  arena:  towards  the 
left-most  edge  (min-y)  of  the  arena,  and  clustered  around 
the  middle  of  the  arena. 

For  a  swarm  of  only  four  members,  using  patterns  4,  6, 
7,  and  8,  the  data  sample  rate  was  reduced  to  approximately 
1  frame/sec  with  nearly  2  seconds  of  lag.  The  use  of  this 
type  of  pattern  recognition  for  identification  and  orientation 
clearly  places  a  severe  limitation  on  the  scalability  of  the 
swarm. 

Difficulty  in  recognition  arises  via  blur  due  to  focus  that, 
as  evident  in  Fig.  1,  varies  throughout  the  field  of  view 
(FOV),  and  off-orthogonal  pixelation  that  occurs  when  the 
robot  is  askew  to  the  cardinal  axes  of  the  camera  system. 
In  these  cases,  the  sharply  drawn  grid  boundary  lines  are 
no  longer  well-defined.  This  coupled  with  the  non-linear 
pixel  scaling  towards  the  edges  of  the  arena  easily  spoofs 
the  pattern-recognition  software.  Given  these  limitations,  an 
alternative  method  for  identification  is  sought. 

The  burden  on  the  vision  system  is  reduced  by  utilizing 
a  simple  background  contrast  comparison  and  area  (i.e., 
blob  size)  and  shape  (i.e.,  circular)  parameters.  In  order 
to  provide  ample  contrast  against  the  black  background 
of  the  arena,  identical  blank  discs  of  paper  are  attached 
to  the  uppermost  turret  on  each  robot.  This  negates  the 
measurement  of  orientation;  the  only  measurement  returned 
by  the  vision  system  is  position  data.  This  additionally  means 
that  the  multiple  agents  present  in  the  field  of  view  are  no 
longer  distinguishable.  This  utilization  of  the  vision  system 
allows  significantly  faster  image  sample  rate,  achieving  10Hz 
without  significant  lag. 

The  task  of  identification  becomes  an  estimation  problem. 
The  identification  of  individuals  will  be  accomplished  via 
comparison  of  the  central  supervisor’s  state  estimates  with 


the  individual  agents’  dead-reckoning  estimates.  The  agents 
have  limited  memory,  so  the  shared  data  must  be  limited,  i.e., 
it  should  not  contain  a  full  time  history  of  estimates.  For  this 
estimation  to  work,  several  independent  but  complementary 
estimation  methods  must  be  applied. 

III.  Acquisition,  Localization,  and  Tracking  for 
Identification 

The  solution  proposed  involves  the  interaction  of  several 
seemingly  independent  tasks.  The  external  observer  must 
acquire  the  swarm  members  in  the  image  as  blobs.  Ex¬ 
ternal  localization  is  in  some  cases  as  simple  as  mapping 
a  measurement  to  an  observed  robot;  in  this  case,  how¬ 
ever,  we  seek  to  estimate  the  unobservable  orientation  and 
predict  future  motion.  Tracking,  in  our  case,  refers  to  the 
correct  association  of  measurements  when  multiple  robots 
are  present.  The  agents  must  maintain  estimates  of  their  own 
positions  utilizing  dead-reckoning.  These  components  will 
be  applied  concurrently,  resulting  in  unique  associations  of 
communication  IDs  with  external  location  estimates. 

A.  External  Localization 

The  central  supervisor  uses  an  Extended  Kalman  Filter 
for  observation  and  estimation  of  four  states  for  each  agent. 
Once  the  set  of  individuals  is  defined,  the  central  supervisor 
uses  the  measurements  as  updates  to  the  estimated  state. 
Individual  agents  in  this  swarm  are  characterized  by  the  state 
vector  X  =  [x,  y,  v,  9]T,  where  x  and  y  are  the  typical 
coordinates  in  the  arena  reference  frame,  v  is  the  linear 
speed,  and  9  is  the  orientation  or  heading  where  9  =  0 
corresponds  to  the  x-axis. 

Direct  measurement  of  the  first  two  states  is  possible  given 
the  sensors  available  in  this  experiment.  The  sensors  are 
imperfect  and  the  measurements  include  noise,  so  an  appro¬ 
priate  measurement-blending  filter  should  improve  results. 
The  third  state,  velocity  v,  cannot  be  directly  measured  with 
this  set  of  position  sensors  but  could  be  discretely  estimated 
given  a  sufficiently  short  sample  time.  The  final  state,  9 , 
cannot  in  any  way  be  directly  measured,  requiring  a  state¬ 
blending  filter  to  exploit  the  model  (in  particular,  the  non- 
holonomic  constraints)  allowing  9  to  be  estimated. 

To  preserve  scalability  of  the  swarm  by  limiting  required 
high  update  rate  agent-to-host  communication,  the  control 
input  to  the  individual  robots  is  unknown  to  the  external 
observer.  Bar-Shalom  [6]  offers  multiple  methods  for  dealing 
with  such  maneuvering  targets :  modeling  the  maneuver  input 
as  a  random  white  noise  process  or  autocorrelated  Markov 
noise;  estimating  the  input  to  correct  the  state  estimate; 
or  augmenting  the  state  with  the  maneuver  input  to  be 
estimated. 

To  propagate  the  model,  changes  in  states  v  and  9 
are  attributed  to  process  noise,  assumed  to  be  zero-mean 
Gaussian-distributed  white-noise  with  covariance  £„  and 
respectively.  These  process  noise  parameters  thus  account  for 
the  unmodeled  and  unmeasured  control  signal.  By  including 
these  contributions  as  noise,  the  magnitude  of  the  parameters 
are  larger  than  required,  but  with  the  benefit  of  a  method 


2743 


that  requires  no  communication  with  the  robot.  The  state 
transition  and  measurement  models  are  defined  in  Eqs.  (1)- 

(3). 
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The  measurements  provided  by  the  vision  system  or 
LADAR  pair  are  absolute  position  measurements,  assumed  to 
include  zero-mean  Gaussian-distributed  white  noise  wx .  wy 
with  covariance  Wx.  Wy.  Measurements  are  denoted  xm,  ym. 
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These  estimates  are  propagated  and  updated  utilizing  a 
2nd-order  Kalman  Filter  [7],  [8].  The  Jacobian  A  of  the  state 
transition  model  is  given  in  Eq.  (4).  The  Hessian  matrices, 
denoted  as  F\ .  b\ .  F:i .  F4 ,  of  each  state’s  transition  function 
are  shown  in  Eq.  (5),  where  subscript  k  indicates  the  iteration 
step.  Note  that,  as  the  third  and  fourth  rows  (f:>  and  fA  of  the 
state-transition  function  depend  only  on  the  noise  parameters 
and  not  any  of  the  states,  F3  =  F^  =  0. 
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The  prediction  step  is  given  in  Eqs.  (6)  and  (7),  which 
is  used  at  each  iteration  to  update  states  before  applying 
the  correction.  These  estimates  are  propagated  and  updated 
in  typical  form.  The  propagation  and  update  equations  are 
presented  without  derivation;  additional  details  may  be  found 
in  references  [7]  and  [8]. 

Xk+i=Xk  +  F(X(t))At 

+  i[l  0  0  0]T  tr(FiPk)  (6) 

+  i[0  1  0  0  ]rtr{F2Pk) 


Pk+ 1  —  AkPkAk  +  Qk  (7) 


Qk 


0  0  0  0 

0  0  0  0 

0  0  Ev  0 

0  0  0  Ei(v\ 


(8) 


Without  measurements,  the  system  propagates  via  the  state 
transition  nonlinear  function.  In  update  steps  where  measure¬ 
ments  are  available  and  valid,  the  innovation  is  performed  as 
shown  in  Eq.  (9). 

Sk+i  =  CPk+iCT  +  R 

Kk+i  =  Pk+iCTS^  (9) 

Xk+\  =  Xk+\  +  Kk+i(Zk+i  —  CXk+i) 

Pk+i  =  (I  —  Kk+1C)Pk 


R  = 


Wx 

0 


0 

Wy_ 


(10) 


In  Eq.  (7)  and  Eq.  (8)  above,  the  process  noise  covariance 
matrix  (more  specifically,  Eek)  is  indicated  as  a  function  of 
time-step  k.  This  is  due  to  the  observation  that  as  v(t)  — »  0, 
the  uncertainty  in  0  rises  due  to  the  dynamics  of  the  robot. 
For  this  reason,  Eek  is  defined  as  a  function  of  v,  such  that 
it  achieves  its  maximum  value  for  v  =  0  and  decreases 
monotonically  to  E®  0  at  v  =  v.  For  more  details  of 
this  state-varying-noise  adaptation,  along  with  experimental 
results,  see  [9]. 


B.  Tracking 

Track  adherence  is  critical  to  maintaining  good  estimation 
by  ensuring  that  measurements  are  applied  properly  to  the 
estimates  of  the  correct  agents.  In  the  presence  of  multiple, 
physically  homogeneous  agents  in  the  field-of-regard  for  the 
LADAR  and  vision  systems,  the  order  of  measurements 
may  be  rearranged  from  sample  to  sample.  Additionally, 
there  may  be  occasional  excessive  error  in  the  measurements 
due  to  a  sensor  glitch  or  communication  error,  or  false 
identifications  returned  either  by  the  LADAR  arc-detection 
or  by  the  vision  system. 

Data  validation  in  tracking  has  been  studied  for  several 
years,  in  many  cases  alongside  Kalman  Filter  methods  and 
especially  in  related  fields  such  as  radar  tracking.  Reid’s 
method  of  tracking  multiple  targets  [10]  provides  clear 
applicability  to  this  case,  although  this  method  does  not 
account  for  drop  track,  the  case  where  a  target  is  no 
longer  in  the  field  of  view.  Bar-Shalom  [6]  covers  many 
methods  for  single-  or  multiple-  target  tracking,  including 
Reid’s  Multiple  Hypothesis  Tracking  (MHT)  algorithm.  Cox 
and  Hingorani  [11]  proposed  a  computationally  efficient 
implementation  of  Reid’s  MHT  algorithm,  achieving  optimal 
assignment  results  in  0(M2)  time. 

A  matching  and  validation  algorithm  has  been  developed 
to  validate  and  reorder  the  data.  The  method  draws  inspira¬ 
tion  from  all  of  the  above  methods. 

The  data  set  is  first  passed  through  a  “sudden  jump”  filter, 
which  simply  inspects  that  the  data  contains  real  values.  The 
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next  step,  target  matching  or  data  association,  is  based  on 
the  Hungarian  Algorithm  developed  by  Munkres  [12].  1 
Measurements  exhibiting  excessive  error  due  to  back¬ 
ground  clutter  or  sensor  noise  cannot  be  identified  easily 
as  described  above.  The  method  for  final  data  validation  is 
adapted  from  Leonard’s  localization  method  [13],  based  on 
the  prior  works  of  Reid  [10]  and  Bar-Shalom  [6].  At  each 
prediction  step  /,:,  a  validation  gate  g  is  established  around 
the  estimated  output  of  the  i-th  agent,  Zk  =  CXk.  The 
range  of  the  validation  gate  is  dependent  on  the  estimated 
covariance  of  error  in  the  prediction,  S.  The  predicted 
state  for  the  ?'-th  agent  Zk is  then  compared  to  the  j-th 
measurement,  Z'k. 


Vij  =  zi-zi 

(11) 

9 2  >  ViiSr'v? 

(12) 

The  parameter  g  defines  the  region  of  acceptance;  the  vali¬ 
dation  region  is  x2 -distributed  and  depends  on  the  dimension 
of  the  measurement  vector.  For  a  2-dimensional  measurement 
vector,  g  =  -\/4.61  ~  2.15. 

C.  Individual  Localization 

Individual  self-localization  requires  the  robot  to  have  the 
ability  to  make  appropriate  measurements  of  its  environ¬ 
ment.  In  the  scenario  where  a  priori  information  about  the 
environment  is  known  and  where  the  robot  has  the  ability 
to  make  measurements  with  sufficiently  high  resolution,  the 
self-localization  problem  has  been  extensively  covered  [13]— 
[17].  Simple  robots,  such  as  the  Khepera-II,  do  not  have  such 
sensing  capability  and  must  rely  on  external  position  updates. 
To  maintain  the  on-board  position  estimate,  odometry  is 
utilized.  The  odometry  model  should  be  calibrated  in  order  to 
minimize  the  required  rate  of  the  external  updates.  Odometry 
calibration  for  this  system  is  based  on  Antonelli’s  Least 
Squares  Method  [18],  [19].  Implementation  of  this  method 
is  discussed  in  [5],  [20]. 

Odometric  error  increases  quadratically  with  distance  [21], 
hence  it  is  quite  susceptible  to  model  errors.  Many  meth¬ 
ods  for  calibrating  odometry  or  mitigating  the  associated 
errors  have  been  proposed  [15],  [18],  [19],  [22],  [23].  The 
sources  of  odometric  error  can  be  reduced  to  two  subgroups: 
systematic  error  and  non-systematic  error.  For  any  of  these 
methods  to  work  properly  there  must  be  a  known  position 
reference  to  allow  correction.  The  systematic  error  can  be 
reduced  by  appropriate  parameterization  of  the  odometry 
model,  followed  by  application  of  an  appropriate  estimation 
method.  With  appropriate  feedback,  the  non-systematic  error 
can  also  be  reduced  [15], 

For  a  differential-drive,  two-wheeled  robot,  the  approxi¬ 
mate  odometry  equations  may  be  written  in  discrete  time  as 
shown  in  Eq.  (13)  and  Eq.  (14).  The  position  and  orientation 
(or  pose)  of  the  robot  are  given  by  ( x,y,9 )  in  world 
coordinates.  For  simplicity  of  frame  mapping,  the  position 

1  Our  implementation  uses  utilizes  a  software  package  written  by  Markus 
Buehren,  updated  22  Oct  2008: 

http://www.mathworks.com/matlabcentral/fileexchange/6543 


of  the  robot  is  taken  at  the  midpoint  of  the  wheel  axis, 
the  natural  center  of  rotation.  The  frame  is  right-handed, 
with  6  measured  positive  about  the  z-axis.  For  incremental 
parameters,  the  subscript  k  denotes  the  change  in  value 
over  the  interval  [fc  —  1,  k] .  For  cumulative  parameters,  the 
subscript  k  denotes  the  value  at  the  end  of  the  interval.  The 
incremental  angular  rotation  of  each  driveshaft  is  Ai/t^L. 
The  wheel  radii  are  represented  as  rR/L.  The  encoder  scale 
factor  mapping  encoder  counts  to  radians  of  wheel  shaft 
rotation  is  c Ri  p .  Incremental  encoder  measurements  for  each 
wheel  are  given  by  Sek '  .  The  incremental  distance  traveled 

D  /  T 

by  each  wheel  is  given  by  dk '  .  The  separation  of  the  points 
of  contact  of  the  wheels  is  given  by  b.  The  incremental 
change  in  orientation  of  the  robot  is  given  by  A9k,  and  the 
incremental,  scalar  change  in  position  is  denoted  A Sk. 


M>k'L  =  Sek'LcR/L 
d*/L=rR/LA^L 
d?  —  dk 


(13) 


i  A  C  ■  <A9k\  to  ,  A()k\ 

Xfe+i  =  xk  +  A<5fcSinc(— ^— )  cos  [9k  + 

,  ac  ■  tA6k \  ■  to  i  A9k\  (14) 

Uk+i  =Vk  +  ASfcSinc(— — )  sm(0fc  + 

9k+i  =  9k  +  A  6k 

where  sinc(x)  =  smx,  lim  x^o  sinc(x)  =  1. 

Additional  navigation  values  of  interest  are  the  curvilinear 
abscissa  S,  the  total  direction-ignorant  accumulated  distance 
traveled  Sabs  ( Sabs  >  0),  and  the  total  orientation  change 
from  initial  position  9q.  These  are  accumulated  as  shown  in 
Eq.  (15).  The  recording  of  these  additional  values  allows  the 
use  of  methods  such  as  Antonelli’s  Least-Squares  [18],  [19]. 


'Sfc+i  =  St 5  +  |ASfc| 

Sk+1=Sk  +  ASk  (15) 

9k+ 1  =  9k  +  A  9k 

Antonelli’s  method  of  odometry  calibration  [18],  [19] 
starts  with  the  mapping  of  measured  driveshaft  angular 
velocities  [ojRiu}l]t  to  state  parameters  [u,  6]1  (linear  and 
angular  speed)  via  the  state-transition  matrix  C  as  shown  in 
Eq.  (16). 


V 

rR/Z 

rL/2' 

Ur 

6 

rR/b 

- rL/b 

ul_ 

(16) 


The  method  requires  construction  of  two  regressor  from 
P  trajectories,  one  for  orientation  and  the  other  for  position. 
This  data  is  compared  to  start  and  end  values  of  accumulated 
wheel  encoder  values.  This  offline  odometry  calibration 
method  is  adapted  to  perform  in  an  online  setting,  with 
the  individual  robot  following  a  trajectory  consistent  with 
its  task  (mapping,  foraging,  etc.).  These  trajectory  segments 
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together  are  the  P  trajectories.  With  each  new  measurement, 
the  regressor  matrices  are  augmented,  and  a  new  C  found. 
Implementation  of  such  an  algorithm  on  an  on-board  proces¬ 
sor  may  be  limited  by  memory  restrictions  (as  the  regressor 
matrices  and  observation  vectors  must  be  augmented  at  each 
time  step)  and  processor  capability  (foremost,  the  ability 
to  efficiently  perform  matrix  inversion;  at  each  update  this 
inversion  would  involve  a  larger  matrix). 

A  system  using  this  method  is  constrained  by  the  require¬ 
ment  of  a  fast  external  position-tracking  system  to  accu¬ 
rately  observe  S;  the  position-tracking  system’s  sampling 
rate  defines  the  upper  bound  of  the  robots’  velocities.  The 
measurement  system  for  this  type  of  calibration  must  be 
capable  of  accurately  measuring  orientation,  requiring  good 
performance  of  the  External  Localization  described  above. 

IV.  Identification 

The  final  task  is  to  unify  all  of  the  above  estimation 
methods  to  yield  communication  with  multiple  robots  with 
whom  communication  is  possible,  yet  whose  positions  are 
unknown  either  to  themselves  or  the  central  observer. 

It  is  desired  to  find  a  set  of  parameters  that,  under  non¬ 
identical  motion  of  agents,  will  provide  discriminatory  infor¬ 
mation.  The  proposed  set  of  data  to  be  used  for  identification 
is  [ATfc,  ^(/c)],  the  current  state  estimate  and  the  current 
estimated  magnitude  of  the  estimated  abscissa.  The  agents, 
however,  do  not  know  the  true  values  of  Xq  in  the  absolute 
reference  frame,  so  this  vector  is  actually  the  agent’s  estimate 
of  the  changes  in  state  and  abscissa  estimate  since  sample 
k  =  0. 

This  places  an  additional  requirement  on  the  external 
observer  to  maintain  the  cumulative  navigation  parameter 
S(k),  as  well  as  the  initial  positions  of  each  robot. 

A  probability-based  identification  hypothesis  for  matching 
external  estimates  to  on-board  estimates  of  these  proposed 
parameters  can  then  be  constructed.  A  simple  implementa¬ 
tion  would  be  to  once  again  utilize  the  matching  and  gate- 
validation  method  described  above  in  Section  III-B  and  in 
Eqs.  (11)-(12).  The  cost  inputs  would  be  related  to  the 
differences  between  the  external  and  internal  estimates  of 
S(k)  and  of  A X(k). 

For  sufficient  confidence  in  the  discrimination,  the  tra¬ 
jectories  of  the  robots  must  necessarily  be  distinct.  The 
external  observer  may  be  able  to  estimate  a  confidence  factor 
by  constructing  the  trajectory-richness  parameters  suggested 
by  Antonelli  [19]  and  discussed  in  Section  III-C.  As  the 
condition  numbers  of  these  matrices  grow,  so  should  the 
confidence  that  the  discrimination  is  correct. 

All  of  the  above  estimation  methods  must  perform  reliably 
for  this  Identification  to  work.  The  External  Localization 
and  Tracking  methods  are  tied  closely  together,  as  without 
one  the  other  will  not  (and  cannot)  work.  Without  Track¬ 
ing,  the  external  observer  cannot  accurately  accumulate  the 
navigation  parameters  of  individual  robots.  Communication 
is  necessary  to  relate  the  internal  estimates  to  the  external 
observer.  These  information  packets  are  small  (five  floating¬ 
point  numbers),  and  the  rate  of  communication  required  is 


very  low.  There  would  be  little  or  no  benefit  to  transmit  these 
parameters  to  the  central  observer  at  high  rate,  as  the  system 
requires  time  to  develop  sufficient  richness  and  the  dynamics 
of  the  robots  are  quite  slow. 

V.  Preliminary  Experimental  Results 

For  verification  of  this  method,  a  more  capable  experi¬ 
mental  setup  is  employed.  The  arena  for  robot  exploration 
is  an  area  that  is  roughly  5 m  x  3.5 m.  The  agents  in 
this  experiment  are  Create  mobile  robots  from  iRobot.  The 
motion  capture  video  system  is  comprised  of  twelve  Opti- 
Track  V100R2  cameras  from  NaturalPoint,  and  the  tracking 
software  utilizes  the  NaturalPoint  Tracking  Tools  API.  Each 
Create  robot  is  assigned  a  unique  pattern  of  reflective  markers 
and  a  corresponding  “rigid  body”  definition  in  the  software. 
This  allows  fairly  robust  tracking  of  the  agents  at  100Hz. 

The  information  made  available  to  the  supervisor  is  the 
same  limited  information  described  in  the  above  sections, 
with  the  additional  information  available  for  use  in  verifica¬ 
tion  and  scoring  of  the  method. 

A  sample  set  of  trajectories  is  shown  in  Fig.  3.  This 
data  was  gathered  with  five  agents  moving  simultaneously. 
The  control  consisted  of  two  components:  a  random  (uni¬ 
formly  distributed)  input  for  each  linear  acceleration  (small, 
positive  mean)  and  angular  acceleration  (zero-mean);  and 
agent  avoidance  via  simulated  sensors.  Velocity  is  capped 
at  Vmax  —  100 mm/s.  Additionally,  the  agents  are  equipped 
with  bump  sensors;  if  activated,  the  agent  reversed  at  max¬ 
imum  velocity,  then  rotated  approximately  90°  with  the 
direction  of  rotation  chosen  arbitrarily. 

Initial  tests  showed  that  parameter  S  is  not  a  very  good 
estimator  unless  the  supervisor  can  very  accurately  determine 
orientation.  In  contrast,  X  and  Sabs  are  quite  simple  to 
maintain  by  the  supervisor,  but  with  the  control  system 
used  thus  far  does  not  provide  a  good  differentiator  between 
agents.  This  is  due  to  the  maximum  velocity  with  small, 
positive  mean  acceleration.  As  a  result,  the  agents  spend  most 
of  their  time  at  maximum  velocity;  hence,  if  all  robots  begin 
moving  at  the  same  time,  the  total  accumulated  distance 
traveled  at  time  T  for  each  robot  approaches  T  *  Vmax. 

Additional  limitations  in  testing  have  arisen  due  to  com¬ 
munication  limitations  in  the  contol  of  each  individual  robot. 
In  this  implementation,  a  single  computer  performs  all 
of  the  work.  Five  identical  instances  of  the  robot  control 
code  execute  simultaneously,  each  one  communicating  with 
a  different  robot  via  Bluetooth.  Under  this  heavy  traffic 
load,  communication  between  the  robot  hardware  and  its 
“virtual  controller”  is  inconsistent,  and  the  data  is  occa¬ 
sionally  corrupted.  This  problem  is  being  addressed  via  the 
introduction  of  on-board  computers  on  each  robot.  Gumstix 
Verdex  computers  will  be  interfaced  with  each  robot  via 
a  Sticky  Interface  from  ElementDirect,  allowing  the  robots 
to  truly  be  independently  controlled.  Communication  with 
the  supervisor  can  then  occur  via  the  more  reliable  wireless 
Ethernet  interface. 

Additional  simulations  utilizing  Player/Stage  [24]  are  un¬ 
der  way  to  confirm  the  success  of  this  method  and  to  assess 
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Fig.  3.  Characteristic  trajectories  for  a  set  of  five  agents  moving  randomly 
in  the  arena. 


[15] 


processing  limitations  with  greater  numbers  of  robots.  The 
simulations  also  allow  a  greater  range  of  testing  parameters.  [16] 


VI.  CONCLUSION 

[17] 

A  method  for  inferential  identification  has  been  proposed 
based  on  prior  experiments  and  on-going  simulation.  This 
identification  allows  the  establishment  of  external  feedback 

L 1  o  J 

for  a  collection  of  visually  indistinguishable  robots,  which 
in  turn  allows  for  odometry  correction  and  calibration,  and 
orientation  estimation.  These  methods  require  no  special 
equipment  or  exceptional  machine  vision  processing.  The 
results  are  also  marginally  scalable,  with  limitations  only  in 
the  matching  phase  (the  Hungarian  Method)  of  the  tracking 
algorithm. 
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Abstract — Cooperative  mobile  robots  must  have  knowledge  of 
their  positions  relative  to  the  group  in  which  they  are  operating. 
Common  on-board  sensors  such  as  laser  rangefinders  may  be 
used  to  detect  and  track  other  robots  with  high  precision,  though 
limited  feature  recognition  and  susceptibility  to  occlusion  reduces 
the  efficacy  of  this  solution  alone.  Matching  multiple  robots’ 
laser  scans  can  overcome  some  of  these  issues,  but  requires 
extensive  memory  usage  and  large  communication  bandwidth. 
Overhead  imaging  systems  may  also  be  utilized,  though  sensor 
nonlinearities,  field  of  view  restrictions,  and  data  latency  limit 
such  usage. 

A  data  fusion  method  is  proposed  for  dynamically  evaluating  a 
mobile  robot’s  position  by  matching  laser  scan  data  to  overhead 
image  data. 

I.  Introduction 

For  autonomous  mobile  robots,  whether  acting  indepen¬ 
dently  or  within  a  swarm,  knowledge  of  its  position  within  the 
environment  is  critical  for  successful  navigation  and  conduct 
of  tasks.  This  position  may  be  relative  to  features  in  the 
environment,  such  as  the  starting  position,  goal,  or  other 
arbitrary  point,  or  may  be  relative  to  other  members  of 
the  swarm.  If  a  supervisory  positioning  system  is  available, 
localization  may  become  a  trivial  task,  although  the  speed 
and  reliability  of  communication  arises  as  a  limiting  factor. 
To  mitigate  the  necessity  of  high-speed  communication  as  well 
as  to  allow  scalability  of  a  swarm  (where  the  communication 
and  perceptive  requirements  of  the  central  supervisory  member 
quickly  exceed  capability),  the  individual  mobile  member 
should  be  tasked  with  additional  localization  responsibility. 

For  the  specific  case  of  formation  control,  the  robot’s 
position  relative  to  the  formation  is  as  important,  if  not  more 
so,  than  the  position  relative  to  the  workspace.  The  works  of 
Yamaguchi  and  Burdick  [1],  Desai  et  al  [2],  Tanner  et  al,  [3]- 
[6],  and  Antonelli  [7]  provide  multiple  methods  for  controlling 
formations  of  robots  under  various  sensing  and  communication 
scenarios. 

Published  research  in  collective  or  collaborative  localization 
includes  that  of  Fenwick,  Newmann,  and  Leonard,  [8],  Fierro, 
et  al  [9],  Howard,  Mataric,  and  Sukhatme  [10],  Rekleitis, 
Dudek,  and  Milios,  [11],  and  Roumeliotis  and  Bekey  [12], 
[13],  A  thorough  discussion  of  the  necessity  of  localization  is 
provided  by  Negenborn  in  his  Master’s  Thesis  [14]. 


Laser  rangefinders  have  become  ubiquitous  on-board  sen¬ 
sors  for  mobile  robots  in  the  last  decade  due  to  their  high 
accuracy  (for  most  common  models,  the  error  is  less  than 
lcm  over  the  usable  range),  large  field  of  view  (frequently 
>  180°),  and  high  speed  data  collection  (10//,:  or  faster  for 
full  scan).  Laser  rangefinders  seem  to  be  the  sensors  of  choice 
for  simultaneous  localization  and  mapping  applications  [15]. 

Laser  rangefinders  have  two  major  limitations  in  multi¬ 
robot  applications,  however.  The  most  severe  is  the  problem  of 
occlusion.  Since  these  sensors  make  measurements  in  the  same 
plane  as  the  robot  (or  a  parallel  plane),  only  the  nearest  object 
to  the  rangefinder  at  that  planar  height  can  be  measured.  This 
is  illustrated  in  Fig.  1.  The  secondary  problem  is  the  sparsity 
of  information  contained  in  the  data.  That  is,  the  rangefinder 
measures  the  nearest  return  surface,  but  categorizing  that 
surface  requires  specialized  processing. 

Overhead  imaging  systems  may  seem  to  be  an  end-all 
solution  to  the  problem  of  formation  measurement  and  off- 
board  localization.  In  fact,  such  systems  do  in  many  cases 
simplify  measurements  of  robot  locations.  Machine  vision 
software,  such  as  OpenCV  [16],  [17],  is  readily  available  for 
basic  feature  extraction  and  pattern  matching.  These  systems, 
however,  also  have  limitations  in  processing  time,  calibration 
requirements,  and  a  priori  knowledge  requirements  such  as 
the  definitions  of  patterns. 

II.  Problem  Description 

A  method  is  sought  to  localize  all  mobile  members  of 
the  collective  relative  to  a  common  frame.  In  this  work,  we 
provide  an  architecture  for  relative  localization  of  members 
of  a  robot  collection  by  the  fusion  of  various  sensor  data 
sets.  This  method  is  based  on  a  combination  of  multiple  laser 
rangefinders  making  in-plane  measurements  of  the  environ¬ 
ment  (which  includes  neighboring  robots)  with  one  or  more 
optical  overhead  images  taken  by  a  supervisory  observer. 

Foremost  is  the  problem  of  data  processing.  In  order  to 
provide  accurate  measurement  capability,  digital  images  must 
be  of  high  pixel  resolution,  placing  stiff  requirements  for  data 
bandwidth,  processing  power,  and  memory  space.  Memory 
requirements  increase  with  the  square  of  pixel  resolution  (for 
fixed  field-of-view),  and  processing  power  requirements  grow 
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Fig.  1.  The  occlusion  problem  for  laser  rangefinders.  The  scale  is  based  on 
a  SICK  LMS400  detecting  Khepera-II  robots. 


at  least  as  fast.  Sensors  capable  of  high-resolution,  high- 
bandwidth  imaging  such  as  this  can  be  quite  expensive,  as 
is  the  processing  required  to  turn  these  images  into  usable 
data  for  the  robots  at  sufficient  data  rates. 

Secondary  and  tertiary  issues  with  using  overhead  imaging 
are  the  calibration  of  the  imaging  system  and  the  finite  field 
of  view  of  the  system.  We  seek  the  implementation  of  an 
overhead  imaging  system  mounted  to  an  aerial  robot,  meaning 
that  the  imaging  sensor  frame  is  explicitly  not  fixed  relative 
to  the  robotic  workspace.  Imaging  sensors  may  incorporate 
zoom  lenses  which  dynamically  change  the  effective  pixel 
resolution;  with  the  exception  of  very  specialized  cameras, 
however,  the  field  of  view  is  generally  limited  to  around  50°. 
In  order  to  capture  a  desired  planar  area,  the  altitude  at  which 
the  UAV-mounted  sensor  must  operate  is  dictated.  The  planar 
resolution  of  the  camera  is  a  function  of  the  distance  to  the 
plane,  however,  so  a  trade  arises  between  maximized  coverage 
area  and  optimal  planar  pixel  resolution.  Related  to  this  is  the 
common  problem  of  nonlinear  planar  pixel  scaling;  for  most 
digital  imagers,  there  may  be  large  differences  between  the 
planar  pixel  scale  factor  along  the  nadir  or  line-of-sight  axis 
of  the  camera  and  the  same  scale  factor  at  the  edges  of  the 
field  of  view.  This  scale  factor  variation  is  exacerbated  by 
larger  angular  fields  of  view,  i.e.,  as  the  angle  from  the  focal 
plane  to  the  sensed  plane  increases. 

A  sample  image  taken  from  an  overhead  machine- vision 
sensor  is  given  in  Fig.  2.  This  image  is  cropped;  the  center 
of  the  true  image  field  of  view  is  to  the  right  of  center  in 
this  image.  Note  the  apparent  curvature  of  the  arena  wall  on 
the  left  of  the  image,  and  the  projection  of  the  robot  bodies 
at  the  lower  extent  of  the  image.  These  projection  issues  can 
induce  estimation  errors  in  the  true  locations  of  the  robots 
within  the  field,  and,  if  pattern  matching  is  utilized  for  robot 
discrimination,  may  severely  reduce  the  rate  of  success  (see, 

e.g.,  [18]). 

III.  Proposed  Method 

The  first  phase  of  estimation  is  performed  by  each  individual 
mobile  sensor.  Each  robot  i  detects  other  robots  within  its 
laser  scan  field  of  regard.  For  this,  the  arc-detection  method 
developed  by  the  authors  (described  in  [18],  [19])  is  employed. 


Fig.  2.  Sample  overhead  image  from  Cognex  5400  camera  of  a  rectangular 
arena  with  Khepera-II  robots  extending  above  a  planar  surface. 


A  geometric  graph  S,  is  assembled  from  these  relative  mea¬ 
surements.  This  subgraph  of  the  total  robot  collective  graph 
may  not  be  observable  by  any  one  sensor. 

In  similar  fashion,  the  overhead  observer  performs  simple 
blob-detection  or  pattern-matching  to  identify  robots  within 
its  field-of-view,  and  constructs  a  geometric  graph  G  of  all 
detected  robots. 

The  crux  of  the  problem  is  then  the  data  assembly  to 
combine  all  of  these  subgraphs  into  the  geometric  graph  repre¬ 
senting  the  entire  system.  In  this  initial  work,  we  will  assume 
that  the  overhead  observer  can  detect  the  entire  collection  of 
robots,  implying  that  the  complete  graph  is  known,  though  it 
includes  sensing  error.  We  denote  the  complete  graph  G  and 
the  subgraph  detected  by  agent  i  as  S,. 

The  method  currently  employed  is  an  exhaustive  compar¬ 
ison  of  the  subgraph  in  question  with  the  complete  graph. 
This  problem,  called  subgraph  isomorphism,  is  unfortunately 
among  the  list  of  proven  NP-complete  problems  [20]. 

The  algorithm  is  executed  as  following.  First,  the  ranks 
of  the  graph  and  subgraph,  re  and  rS/ ,  respectively,  are 
established.  The  assumption  that  Si  C  G  implies  re  > 
7's, ,  thus  the  set  of  possible  nodes  of  G  included  in  Si 
is  I  =  {l,...,rG}-  We  then  generate  a  list  of  hypotheses, 
permutations  of  rs,  indices  chosen  from  set  I. 

The  algorithm  then  loops  over  the  set  of  hypotheses,  I, 
testing  each  hypothesis.  The  test  criterion  is  to  identify  the 
hypothesis,  I„  C  I,  which  minimizes  the  difference  between 
the  truncated  graph  G  with  the  subgraph  Si.  That  is, 
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where  Gn  is  the  submatrix  of  G  corresponding  to  the  index 
set  ln. 

IV.  Experimental  Setup 

We  are  utilizing  three  different  experimental  configurations 
to  verify  this  method.  The  first,  described  in  detail  in  [18], 
utilizes  a  fixed  overhead  Cognex  5400  machine-vision  sensor 
and  computer  observing  a  1.6m  x  0.8ro  arena  containing  up  to 
twelve  Khepera-II  robots.  Additional  sensing  is  provided  by 
a  pair  of  SICK  LMS400  LADAR  sensors.  This  system  with 
fixed,  external  sensors  is  used  to  measure  the  accuracy  and 
precision  of  this  method,  as  this  system  is  well-characterized. 

The  second  experimental  configuration,  depicted  in  Fig.  3, 
is  a  larger  lab  space  (5.5m  x  5.3 m),  utilizing  several  iRobot 
Create  ground  vehicles.  The  Creates  are  each  equipped  with  an 
on-board  computer  (currently,  Dell  MinilO  netbooks),  a  web¬ 
cam  (Logitch  QuickCam  Pro  9000),  and  a  laser  rangefinder 
(Hokuyo  URG-04LX-UG01).  Additional  Create  robots  are 
planned  with  similar  sensing  but  utilizing  GumStix  Verdex 
on-board  computers.  Overhead  imaging  is  provided  by  one 
or  two  Skybotix  CoaX  helicopters  each  equipped  with  a 
webcam  (Targus  AVC05EU)  and  an  on-board  computer  (Gum¬ 
Stix  Overo).  Additionally,  our  research  group  is  develop¬ 
ing  a  semiautonomous  indoor  blimp,  equipped  with  an  on¬ 
board  computer  (GumStix  Verdex)  and  I/O  board  (Gumstix 
RoboStix)  to  support  a  webcam.  For  metrology  as  well  as 
possible  augmentation  of  the  vision  capabilities,  this  lab  space 
is  equipped  with  a  12-camera  NaturalPoint  OptiTrack  motion 
tracking  system.  A  Create  robot  in  our  experimental  configu¬ 
ration  is  shown  alongside  a  CoaX  aerial  robot  in  Fig.  4. 

The  third  experimental  configuration  is  a  simulated  world, 
implemented  in  Player/Stage  [21].  The  simulation  allows  us  a 
path  for  software  development  with  fewer  uncertainties  than 
are  usually  encountered  with  real  hardware  systems.  It  also 
provides  us  with  the  ability  to  predict  whether  this  method 
will  be  successful  and  estimates  of  error  bounds,  as  well  as  to 
evaluate  the  effects  of  different  types  of  sensors  to  be  procured 
in  the  future. 

The  strengths  of  each  sensor  type  should  be  exploited  to 
improve  the  estimation  quality  of  the  system.  The  utilization  of 
multiple  in-plane  laser  rangefinders  on-board  individual  robots 


Fig.  4.  The  Skybotix  CoaX  helicopter  with  one  iRobot  Create  in  experimental 
configuration. 


Fig.  5.  On-board  laser  data  superimposed  onto  sketchup  of  lab  layout, 
including  obstacles  and  other  robots.  The  grey  circle  indicates  a  typical  field 
of  view  for  a  camera  field-of-view  on  one  of  the  aerial  vehicles. 


extends  the  sensed  area.  Including  overhead  images  fills  gaps 
where  the  rangefinders  are  occluded.  Use  of  the  overhead 
image  provides  a  template  onto  which  the  rangefinder  data 
may  be  matched,  yielding  a  common  frame  of  reference  for 
all  vehicles  in  the  collective. 

The  ultimate  goal  of  this  matching  is  to  provide  a  method 
of  unifying  multiple  sensor  frames  into  a  single  relative  frame, 
including  correction  for  rotation,  translation,  and  scaling.  As 
of  this  writing,  the  problem  of  skewing  or  stretching  has  not 
yet  been  addressed. 
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Fig.  6.  Two  laser  rangefinder  scans,  correlated  into  a  single  map.  The 
difficulty  in  identifying  robots  in  this  data  is  made  clear;  there  are  six  robots 
in  the  field  in  these  scans. 

V.  Results 

The  most  obvious  downfall  of  this  method  is  the  assumption 
that  the  graph  topography  does  not  contain  repeated  subgraph 
isomorphisms  of  rank  r's, .  I.e.,  the  elements  of  the  graph  must 
be  sufficiently  diverse.  This  assumption  leads  to  the  practical 
requirement  that  subgraph  Si  is  large  enough.  Exactly  how 
large  rSi  must  be  depends  on  the  diversity  of  the  full  graph 
G.  It  should  be  clear  that  this  method  cannot  work  for  very 
regular  formations,  such  as  a  rigid  formation  of  equilateral 
triangles. 

If  the  above  condition  is  met  (i.e.,  rs ,  is  large  enough  and  G 
is  diverse  enough),  and  when  graph  G  and  subgraph  S,  contain 
no  error,  any  matrix  norm  works  with  this  method.  Indeed, 
even  the  scaling  of  S,  is  unnecessary  if  the  sensor  frames 
being  matched  have  uniform  scaling.  If  noise  is  included,  the 
Frobenius  norm  works  most  reliably,  as  it  is  a  corollary  to  the 
sum-of-squared-error  in  vector  space. 

VI.  Conclusion 

This  work  is  on-going  with  data  collected  in  the  Khepera-II 
lab,  along  with  new  experiments  and  data  collection  in  the 
Create  lab  and  in  the  Player/Stage  simulation.  Preliminary 
results  are  promising,  though  the  parameter  space  has  yet  to  be 
explored  for  robustness  and  processing  efficiency  -  specifically 
the  scalability  of  this  problem.  If  this  method  is  to  be  used 
with  a  collective  or  swarm  robotic  system,  the  processing 
requirements  must  be  fixed  (or  nearly  so)  with  increasing 
numbers  of  robots,  requiring  that  each  individual  robot  must 
do  as  much  processing  as  possible.  Continuing  work  focuses 
on  more  efficient  methods,  such  as  the  search  algorithms  in 
[22]  and  the  use  of  graph  topology  for  image  processing  [23]. 
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ABSTRACT 

The  problem  of  perimeter  detection  and  monitoring  has  a 
variety  of  applications.  In  this  paper,  a  hybrid  system  with  fi¬ 
nite  states  strategy  is  proposed  for  multiple  autonomous  robotic 
agents  with  the  purpose  of  hazardous  spill  perimeter  detection 
and  tracking.  In  the  system,  each  robotic  agent  is  assumed  to  be 
in  one  of  the  following  states:  searching,  pursuing,  and  tracking. 
The  agents  are  prioritized  based  on  their  states,  and  a  potential 
field  is  constructed  for  agents  in  each  state.  The  location  and  ve¬ 
locity  of  an  agent  in  the  tracking  state  as  well  as  those  of  its  clos¬ 
est  leading  and  trailing  agents  are  utilized  to  control  its  move¬ 
ment.  The  convergence  of  the  tracking  algorithm  is  analyzed  for 
multiple  spills  under  certain  conditions.  Simulation  results  show 
that  with  the  proposed  method,  the  agents  can  successfully  detect 
and  track  the  spills  of  various  sizes  and  movements. 

INTRODUCTION 

The  development  of  swarm  robotics  has  emerged  as  a  tool 
for  mobile  sensor  network  in  a  variety  of  areas,  such  as  envi¬ 
ronment  monitoring  [1],  foraging  [2]  [3],  target  detection  [4], 
and  target  tracking  [5].  Two  different  kinds  of  control,  central¬ 
ized  control  and  decentralized  control  can  be  implemented  for 
the  robot  group.  In  centralized  control,  all  agents  in  the  group 
are  assumed  to  be  able  to  share  their  information  [6].  This  kind 
of  control  suffers  from  the  computational  burden  for  large  agent 
number.  More  often,  decentralized  control  is  used  for  swarm 
robotics  in  which  only  information  within  a  local  network  is 
shared  among  agents  [1]  [7],  In  this  paper  we  propose  a  method 
for  the  problem  of  perimeter  detection  and  tracking  with  swarm 

*  Address  all  correspondence  to  this  author. 


Devendra  P.  Garg* 

Greg  Fricke 

Mechanical  Engineering  &  Materials  Science  Dept. 
Duke  University 

Durham,  North  Carolina,  27708 
Email:  {dpgarg,  gregory.fricke}@duke.edu 


robots  which  may  have  a  variety  of  applications,  for  instance: 
forest  fire  surveillance  [8],  oil  leakage  tracking  [9],  and  animal 
herd  monitoring.  Some  of  the  previous  work  in  this  area  has 
been  reported  in  literature  as  follows.  Casbeer  et  al  [8]  proposed 
a  decentralized  multiple  UAV  approach  to  monitor  a  forest  fire. 
Each  UAV  flew  along  the  perimeter  of  the  fire  in  one  direction 
and  then  flew  backward  when  it  met  another  UAV.  This  move¬ 
ment  continued  and  the  movement  of  the  UAVs  formed  a  la¬ 
tency.  This  method  could  easily  adapt  to  a  change  in  the  fire 
size  and  the  number  of  UAVs  needed  during  the  process.  Susca 
et  al  [10]  [11]  used  a  polygon  to  approximate  an  environmental 
boundary.  Local  information  gathered  via  mobile  sensors  was 
used  to  locate  interpolation  points  to  form  a  polygon.  The  ver¬ 
tices  of  the  polygon  were  located  uniformly  along  the  boundary. 
When  the  boundary  was  stationary,  the  method  was  convergent. 

Clark  et  al  [9]  and  Cruz  et  al  [12]  utilized  a  hybrid  con¬ 
trol  algorithm  to  achieve  the  perimeter  search  and  monitoring  in 
an  unknown  environment.  The  robots  were  initially  randomly 
placed  in  the  work  space.  When  a  point  on  the  perimeter  was  de¬ 
tected  by  a  robot,  the  robot  was  navigated  by  a  potential  field  to 
approach  this  point  and  communicated  this  information  to  other 
robots.  When  a  robot  arrived  at  the  perimeter,  it  was  controlled 
by  a  tracking  controller  to  travel  along  the  perimeter.  Bruemmer 
et  al  [13]  utilized  the  social  potential  field  generated  by  differ¬ 
ent  kinds  of  sensors  on  each  agent  to  avoid  collision  and  attract 
agents  to  the  perimeter.  The  robot  learned  to  respond  to  fluctua¬ 
tions  in  sound  and  light  through  perception  and  communication 
such  as  sound  chirping,  touch  sensing,  and  moisture  detection. 

Another  line  of  research  using  multiple  robots  is  in  the  for¬ 
mation  control.  In  this  area,  a  group  of  agents  moves  to  ful¬ 
fill  some  task,  such  as  foraging  or  tracking,  while  maintaining 
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its  geometric  formation  along  the  way.  Olfati-Saber  [14]  [15] 
proposed  a  method  to  control  the  movement  of  a  flock.  A  po¬ 
tential  function  was  designed  whose  local  minima  were  set  to  be 
achieved  when  the  distance  between  each  agent  and  its  neighbors 
in  the  flock  equalled  a  desired  value.  The  method  was  proved  to 
be  asymptotically  structurally  stable  under  certain  conditions.  A 
(3  agent  and  a  y  agent  were  proposed  to  extend  this  method  in 
an  environment  having  a  number  of  obstacles.  The  flock  could 
avoid  obstacles  and  then  maintain  its  formation  during  its  move¬ 
ment.  The  stability  of  the  flocking  of  a  mobile  agent  group 
with  fixed  and  dynamic  topologies  was  studied  by  Tanner  el  al 
in  [16]  [17].  A  social  potential  field  for  the  agent  group  was  pro¬ 
posed  to  control  the  distance  between  an  agent  and  its  neighbors 
while  the  velocity  of  each  agent  was  controlled  by  the  difference 
in  the  velocity  from  its  neighbors.  However,  in  all  of  these  cases, 
the  distance  between  each  pair  of  agents  converged  to  a  desired 
value  which  was  predetermined.  This  was  not  suitable  for  the 
case  in  which  the  desired  distance  between  two  neighbors  was 
unknown  before  hand  or  it  changed  with  time. 

In  this  paper,  inspired  by  the  flocking  of  agents,  a  control  law 
is  proposed  for  a  group  of  agents  whose  purpose  is  to  search  and 
track  a  hazardous  spill  in  an  unknown  environment.  Each  agent 
is  assumed  to  have  limited  and  specific  sensing  and  communica¬ 
tion  ranges.  It  is  shown  that  the  group  can  successfully  detect  the 
spill  and  track  along  its  boundary  when  the  spill’s  location  and 
size  are  changing.  Simulation  results  also  showed  an  excellent 
performance  to  track  multiple  spills  that  may  come  together  or 
split  during  the  process. 


PROBLEM  FORMULATION 

A  group  of  autonomous  mobile  agents  M\ ,  Jh,  . ..,  M„  is 
assumed  to  be  initially  deployed  in  the  area  ‘W.  At  time  f,  the 
configuration  of  agent  i  is  represented  as  q,(f)  =  {r, -(f), 0, -(f)}, 
where  r,(f)  =  {r,(f),y,(f)}  denotes  the  position  of  Mj  at  time  t, 
and  0,(f)  denotes  its  orientation.  Each  agent  is  assumed  to  be 
equipped  with  a  sensor  whose  field  of  view  (FOV)  is  Si,  and 
communication  range  is  For  simplicity’s  sake,  in  our  pa¬ 
per,  we  assume  that  both  Si  and  Tf]  are  circles  whose  centers  are 
located  at  r,(f),  and  are  of  constant  radii  r$  and  r,w.  This  as¬ 
sumption  is  commonly  used  by  various  researchers  [14]  [18]  .  A 
number  of  spills  £2;(f )  with  the  boundary  d£lj(t)  are  deployed  in 
W.  A  spill  Clj(t)  is  defined  as  a  connected  subspace  in  the  two 
dimensional  space.  Each  agent  is  assumed  to  be  able  to  mea¬ 
sure  the  length  of  the  spill  boundary  within  its  FOV.  This  length 
may  be  computed  in  different  ways.  One  way  is  to  utilize  pat¬ 
tern  classification  technology  to  detect  the  spill  boundary  from 
a  visual  sensor  as  implemented  in  [19].  Then  the  detected  spill 
boundary  is  represented  by  a  series  of  points  on  it  and  a  straight 
line  segment  is  generated  with  each  pair  of  adjacent  points.  The 
spill  boundary  length  is  approximated  by  summing  the  lengths 
of  all  the  straight  line  segments.  When  307(f)  changes,  it  would 
be  beneficial  that  the  agent  group  can  uniformly  distribute  along 
the  spill  boundary  since  the  change  of  a  part  on  dO;  (f)  could  be 


Figure  1 .  SAMPLE  LAYOUT  OF  THE  MULTIPLE  AGENT  SYSTEM. 


detected  by  an  agent  quickly. 

A  method  is  proposed  to  control  the  movement  of  each  agent 
tracking  a  spill.  Only  the  positions  of  an  agent’s  closest  leading 
and  trailing  neighbors  along  its  way  may  be  utilized  to  generate 
a  potential  field  to  control  its  movement.  An  influence  distance 
along  the  spill  boundary,  denoted  by  L,  controls  how  the  closest 
neighbor  affects  an  agent’s  potential  field.  This  assumption  guar¬ 
antees  the  scalability  of  proposed  method  and  will  be  discussed 
in  the  “Proposed  Method”  section. 

The  problem  can  be  formulated  as  follows:  given  a  group 
of  agents  M\,  Sh,  . ..,  M,  and  their  initial  configurations 
qi(0),q2(0),...,q„(0),  find  the  path  for  each  agent  i,  where 
i  =  1,2 such  that  Ve  >  0  there  exists  a  time  T  >  0 
and  a  spill  index  j,  for  t  >  T  agent  i  tracks  £lj(t),  and 
\\ri(t),rN(ri(t))(t)\\dnj=Lj(t)  satisfies: 


mo- 

Lj(t)  >=L-e 


<  8 


if  nj(t)  >  » 
if  nj(t)  <  m[m 


(1) 


Here,  j  \a, b\ is  the  distance  from  a  to  b  along  the  curve  dc  with 
the  forward  direction  as  shown  in  Fig.  2;  N(r,(t))  is  the  index  of 
the  closest  neighbor  of  Mi  in  the  forward  direction;  n;(f)  is  the 
number  of  agents  that  are  tracking  at  time  t;  and  |  |3Q;-(f)|  |  is 
the  length  of  Furthermore,  the  velocities  of  agents  track¬ 

ing  the  same  spill  converge  to  the  same  value.  Mathematically 
speaking,  assume  that  agent  i  and  j  are  tracking  the  same  spill, 
Ve  >  0,  3T  >  0,  such  that  Vf  >  T  we  have  |v,-(f)  —  Vj(t)\  <  e. 
When  there  is  no  ambiguity,  we  simplify  each  variable  without 
writing  time;  for  example,  represents  lj(t). 

The  communication  among  the  mobile  agents  can  be  repre¬ 
sented  as  a  set  of  connected  undirected  graphs,  each  of  which  can 
be  defined  as  rEk),  where  T].  =  {a^ , a/2 , . . . , ai(k }  are  the 

agents  in  Qk,  and  "Ek  =  {(a;,,a/;)  S  Vk  x  Vk  \  ai^aij  £  Vk,aij  £ 
£ Wjj}.  The  information,  such  as  a  detected  point  of  the  spill 
boundary  by  an  agent,  is  assumed  to  be  able  to  be  shared  within 
the  graph  with  no  time  delay.  An  example  of  the  agent  system 
with  ten  agents  and  one  spill  is  shown  in  Fig.  1. 
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PROPOSED  METHOD 

A  hybrid  control  technique  is  used  to  control  the  agents. 
Three  behaviors,  searching,  pursuing,  and  tracking,  are  used  for 
perimeter  detection  and  tracking.  The  agents  are  assumed  to  have 
a  unicycle  model  in  which: 

Xj  =  Vjcosd j 
yi  =  VjSinQj 
Vi  =  Clj 

0,  =  (2) 

where  v;-  is  the  linear  velocity  of  the  zth  agent,  a,  is  its  linear 
acceleration,  and  CO,  is  its  angular  velocity.  The  control  inputs  of 
each  agent  are  set  up  as  [zq  w,],  where  u,  =  cij,  Wj  =  CO,-. 

Perimeter  Searching  and  Pursuing 

In  the  proposed  method,  the  robotic  agents  are  initially  ran¬ 
domly  deployed  in  an  unknown  environment.  The  geometry  of 
each  agent  is  ignored  in  the  system,  i.e.,  two  agents  collide  only 
when  their  positions  are  the  same.  To  avoid  collision  ,  we  assume 
that  there  is  a  repulsive  potential  field  generated  by  each  agent, 
denoted  as  t/,-  [20]: 

U(r)  =  /  (phW  —  ft))2  if  P(r’r')^P«  (3) 

\o  if  p(r,r,)>p0 

where  r|  i  is  a  scaling  parameter,  po  is  the  influence  distance  of 
each  agent,  r,-  is  the  z'th  agent  position,  and  p(r,r;)  is  the  distance 
between  r  and  r,. 

The  agents  in  different  states  are  assumed  to  have  different 
priority  levels  for  avoidance.  Agents  in  the  searching  state  are 
assumed  to  be  at  the  lowest  level,  and  agents  in  the  tracking  state 
are  at  the  highest  level.  Let  the  index  sets  Is,  Ip,  and  It  represent 
the  index  of  agents  in  searching,  pursuing,  and  tracking  states, 
respectively.  Assume  an  agent  in  the  searching  state  is  at  r,  and 
let  Ns( r)  =  {i  \  Vi,  r  £  fMj}  denote  the  index  of  agents  in  whose 
communication  range  r  lies.  When  Ns( r)  ^  0  the  following  po¬ 
tential  field  is  constructed  for  the  agent: 

Us(r)=  £  Ui{ r)  (4) 

zeZVjfr) 

and  the  force  implemented  on  the  agent  is  computed  as: 

Fs(  r)  =  -VUS(  r)  (5) 

||T’J(r)||  is  utilized  as  control  zz,-  for  the  agent,  and  to,-  is  set  at  the 
value  with  which  the  orientation  of  the  agent  in  the  next  time  step 
is  consistent  with  the  direction  of  Fs( r).  Since  the  travel  distance 


and  search  time  are  not  considered  in  the  objective  of  our  prob¬ 
lem,  when  Ns( r)  =  0,  a  random  search  technique  in  discrete  time 
with  the  input  [«[(£),  w'  (k)]  is  utilized  to  navigate  the  agent  at  q 
with  searching  state: 

urj(k  +  1 )  =  u]  ( k )  +  b{k)At 
wrt(k+  1)  =  wri(k)  +  c(k)At  (6) 

where  b(k)  is  a  random  variable  having  uniform  distribution  with 
the  support  assigned  by  the  user,  c(k)  is  a  random  variable  having 
normal  distribution  with  a  zero  mean,  and  variance  specified  by 
the  user,  and  Ar  is  the  discrete  time  period.  When  other  objective 
such  as  travel  distance  is  included,  collaboration  between  agents 
may  be  used  to  improve  the  search  efficiency. 

When  a  point  lying  on  the  boundary  of  the  spill  is  within 
the  zth  agent  sensor’s  FOV,  a  detection  occurs.  The  zth  agent 
then  changes  to  a  pursuing  state  and  shares  the  location  of  the 
detected  point,  denoted  by  rrj,  with  other  agents  in  its  communi¬ 
cation  graph.  A  potential  field  f/j(r)  is  constructed  for  the  agent 
moving  towards  r,/.  It  is  given  by: 

Ud(r)  =  ^il2p(r,rd)2  (7) 

where  132  is  a  scaling  parameter,  and  p|r.  r,/)  is  the  distance  be¬ 
tween  r  and  r^.  Let  Np(r)  =  {/  |  Vi  £  Ip  U4,  r  £  fW]}  denote  the 
index  of  agents  in  pursuing  or  tracking  state,  in  whose  commu¬ 
nication  range  r  lies.  The  following  potential  field  is  constructed 
for  the  agent: 

Up(r)  =  Ud(r)+  £  Uj( r)  (8) 

ieNp(r) 

The  negative  gradient,  Fp( r),  of  the  potential  Up(r)  is  utilized  to 
navigate  the  agent  in  pursuing  state.  In  practice,  we  assume  that 
there  is  a  maximum  velocity,  vmax,  to  bound  the  movement  of 
each  agent. 

Tracking  the  Spill  Boundary 

Assume  that  the  agent  i  is  in  the  pursuing  state.  When 
||r,-  —  r,i |  <  d,  where  d  is  a  capture  distance  defined  by  the 
user,  the  agent  changes  into  the  tracking  state,  it  will  choose  to 
move  along  the  spill  boundary  where  the  inside  of  the  spill  is 
on  its  left.  The  forward  direction  of  the  agents’  movement  is 
as  shown  in  Fig.  2.  When  an  agent  moves  away  from  the  for¬ 
ward  direction,  its  velocity  is  negative.  In  this  case,  the  unicycle 
model  in  Eq.  (2)  can  still  be  used  only  by  adding  n  to  the  ori¬ 
entation.  For  agent  z,  is  defined  as  follows.  Assume  that  at 
time  r,  the  zth  agent  is  tracking  klj.  It  will  measure  the  length  of 
the  perimeter  and  positions  of  other  agents  in  the  tracking  state 
within  its  FOV.  Then,  if  the  closest  neighbor  at  the  forward  di¬ 
rection,  whose  index  is  represented  by  N( r,-),  is  inside  its  FOV, 
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Forward 

direction 


Figure  2.  THE  LOCATION,  FORWARD  MOVEMENT  DIRECTION,  AND 
DISTANCE  BETWEEN  ADJACENT  MOBILE  AGENTS  ALONG  THE 
SPILL  PERIMETER. 


set  li  =  ||r,-,r^(rj)||gjj  .;  otherwise  /,-  is  unknown  to  the  agent.  As¬ 
sume  that  d£lj  can  be  represented  by  a  function  s  =  /( r),  where 
s  £  [0,  ||3£2;||]  indicates  the  length  of  the  perimeter  from  a  ref¬ 
erence  point  to  r  along  the  forward  direction.  Write  /( r,)  as  s; 
and  s(rN(r.  A  as  s/+i,  then,  when  /,  is  known  to  agent  /,  it  can  be 
computed  as  follows: 


k  = 


$i-\- 1  $i 
Si+\  -Si+ 


if  si+ 1  >  Si 
if  si+ 1  <  St 


(9) 


Define  a  virtual  distance,  L,  between  agent  /  and  /V ( r, )  as: 


Li  = 


I L 


if  U  <  L 

if  lj  >  L  or  lj  is  unknown 


(10) 


where  L  is  the  influence  distance  between  agent  i  and  agent 
N(rj),  and  is  set  as  L  =  r$  in  our  paper. 

Assuming  Lo  =  LUj,  i.e.,  the  last  agent  wraps  to  the  first 
agent,  a  potential  function  [21]  [22]  is  proposed  for  agents  i, 
which  is  given  by: 

Vi(Li,Li-l)=k1(2ln(Li)  +  1^-)  (11) 

Li 


Figure  3.  THE  CHANGE  IN  0/  AT  EACH  TIME  STEP,  a)  THE  MOVE¬ 
MENT  OF  AGENT  AND  SPILL  HAVE  INTERSECTION,  IF  As;  >  0,  AN¬ 
GULAR  CHANGE  IS  A0,-;  IF  As;  <  0,  ANGULAR  CHANGE  IS  A'0;; 
b)  THE  MOVEMENT  OF  AGENT  AND  SPILL  HAVE  NO  INTERSEC¬ 
TION,  AGENT  MOVES  TOWARDS  THE  CLOSEST  BOUNDARY  POINT 
OF  THE  SPILL. 


used  as  the  linear  acceleration  input  of  the  /th  agent. 

kl(rt  ~  %1)  +  (v;+1  -2v;+v;-t)  ifL(  <L,L;_i  <  L 
u__<  ^l(i-%1)  +  (Vi-l-V;)  ifL;  =L,L;_1  <L 

+  (w+1  -  V;)  if  Li  <  L  ,  Li^i  =  L 

0  if  L;=L,L;_!=L 

(13) 

From  Eq.  (12)  and  Eq.  (13)  it  is  evident  that  when  Li  goes  to 
zero,  M;  goes  to  minus  infinity.  This  guarantees  that  the  /th  agent 
does  not  surpass  the  (i  +  l)th  agent,  which  implies  that  the  order 
of  the  agents  is  maintained  when  they  are  moving. 

The  angular  velocity  of  the  /th  agent  is  also  controlled  dis¬ 
cretely.  Assume  that  the  change  in  distance  traveled  by  the  /th 
agent  during  At  is  A Sj.  The  change  in  0,,  denoted  as  A0,  is  com¬ 
puted  in  different  situations  as  shown  in  Fig.  3. 

If  another  kind  of  agent,  denoted  as  ®,  which  has  a  much 
larger  measurement  range,  (e.g.,  an  airborne  agent)  is  available  to 
monitor  the  spill  when  the  multiple  robotic  agents  are  detecting 
the  spill,  its  measurement  can  be  used  as  a  coarse  measure  of  /;. 
This  measurement  of  /,  usually  has  a  lot  of  noise  due  to  the  low 
resolution  of  the  global  measurement.  In  that  case,  Eq.  (10)  may 
be  changed  to: 


where  k\  is  a  positive  constant.  When  Li  <  L  and  i  <  L,using 
Eq.  (9),  Eq.  (10)  and  Eq.  (11),  the  derivative  of  Vi  with  respect 
to  Si  is: 


if  lj  <  L 
if  /;  >  L 


(14) 


dVt 

dsi 


(12) 


where  lj  is  the  noisy  measurement  by  agent  (B.  This  case  will  be 
studied  in  our  future  work. 


Stability  Analysis 

where  Vi  has  a  minimum  at  L,  =  L;_i.  Including  the  control  of  To  investigate  the  system’s  stability,  we  consider  a  time  t  > 

velocity  for  agents,  the  following  controller  is  proposed  to  be  T  in  which  we  assume  that  all  agents  are  in  the  tracking  state,  and 
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the  spills  stop  changing  in  size  and  location.  Another  assumption 
is  that  the  number  of  agents  is  large  enough  so  that  each  agent  is 
connected  with  its  leading  and  trailing  neighbors  during  tracking. 
Without  loss  of  generality,  we  relabel  the  agents  so  that  the  first 
n\  agents  track  £2/,  the  (n  \  +  l)th  to  (ii  \  +  «2)th  agents  track  kl2, 
and  so  forth.  Let  1  =  \L\ .  L2. . . .  ,Ln]T .  The  collective  potential  of 
the  system  can  be  given  by: 


V(s)=V(l)  = 


E 


j= i 


f  n0+...+rij  \ 

E  vi 

\i— «o+***+ny-i+l  / 


(15) 


where  s  =  [si, . . . , .v„]7';  p  is  the  number  of  spills  when  the  spills 
stop  changing;  no  equals  zero;  and  with  M \ .  M2,  . . .  .Mp  being 
the  lengths  of  the  p  spill  boundaries,  L\  ,L2.. . .  ,Ln  satisfies: 


Figure  4.  AN  EXAMPLE  OF  THE  COLLECTIVE  POTENTIAL  VALUE 
CHANGING  WITH  L\  AND  L2. 


L\  -\-L2  + . . .  -\-Lni  —  M\ 

Ln  !  +  l  +^«i+2  +  •  ■  •  +Ln1+n2  =  Ml 


(16) 


l.  Aii+.-.+np-i+l  L‘n\+...+np_i+2  +  . . .  -\-Ln  —  Mp 

The  collective  dynamics  of  the  system  is  given  by: 


s  v 

.  =  _r3VL  dV2  dVn  T 
3s  1  ’  ds2  ’  '  ds„ 


£(s)y 


(17) 

(18) 


Then 


H(s,y)  =VV(s)ry  -  [^, . . . ,  ^]Ty-yT£(s)y 

_rdvni  dv2  av„-i  ,  dV„-np+1  r 


ds, 

2  Ln- 2 


}y-y1  L(s)y 
1 


3s  1  ’  ’  dsn 

j  r  2  Ln  !_1  1 

|  Ln  1  L2l]  L2  Ln  -  1  L?n  _  |  f  1  j 

- yTL(s)y  (21) 


where  L  has  the  form 


where  v  =  [vj, . . . ,v„]r,  and  L  is  the  graph  Laplacian  as  de¬ 
scribed  in  references  [14]  and  [23]. 

Using  Lagrange  Multipliers,  Eq.  (15)  has  an  extreme  point 
at 


/A 


L  = 


A 


lpJ 


(22) 


A=A  =  ...=Lni  =  f 

j  _  j  _  _  j  _  M; 

i  +  l  —  *-m\+  2  —  •  •  •  —  -^ni+«2  — 

Al+...+np-i  +  l  =  Lni+...+np_i+2  =  ■ 

Although  it  is  analytically  difficult  to  show  that  this  point  is  the 
only  minimum  for  Eq.  (15),  numerical  simulations  with  matlab 
support  this  assumption.  For  example,  when  n  =  3,  M  =  24, 
p  =  1,  and  k  =  1,  the  value  of  V  is  shown  in  Fig.  4,  where  the 
minimum  of  V  appears  at  L\  =  L2  =  L\  =  8  and  satisfies  Eq. 
(19).  Eq.  (19)  is  satisfied  for  other  values  of  n,  M,  and  p  also  as 
verified  by  simulations. 

Let  H(s,y)  represent  the  energy  of  the  system,  i.e.. 


(19) 


_  Mp 
„p 


which  is  a  positive  semidefinite  block  diagonal  matrix  with  rank 
n  —  p,  and  Lj  has  a  dimension  nj.  In  general,  H  may  be  positive, 
this  is  caused  by  the  case  that  only  information  of  the  closest 
leading  and  trailing  neighbors  is  utilized  to  generate  the  potential 
function  for  agent  i.  Therefore,  for  agent  i,  no  information  on 
Vi- 1  and  V,+i  is  available.  However,  with  bounded  v  and  the 
control  in  Eq.  (13),  each  L,-,  where  i  =  1,2 tends  to  be 
close  to  each  other,  and  then  the  first  set  of  terms  in  curly  brackets 
on  the  right  side  of  Eq.  (21)  gets  close  to  zero  which  makes  the 
sign  of  H  dominated  by  —  v7L(s)v  and  be  negative. 

In  this  case,  H  equals  zero  when  v  is  an  eigenvector  of  L 
corresponding  to  eigenvalue  zero.  Then  v  will  be  in  the  space 
with  basis  containing  p  vectors,  denoted  by  b(-  =  [ci,C2, . . . 
where 


tf(s,v)=V(s)  +  -* 

z  (=t 


(20) 


1  n0  +  . .  .+«,■_!  + 1  <  j  <  n0+  . . .  +w/ 
0  otherwise 


(23) 
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for  i  =  1,2 and 


v  —  A-ibi  +  A,2b2  +  ■  •  •  +  A,pbp  (24) 

where  A i  =  1,2 is  an  arbitrary  parameter,  which  in  this 
case  represents  the  steady  linear  velocity  of  all  agents  on  3£2,. 
This  shows  that  the  velocities  of  all  agents  tracking  the  same  spill 
converge  to  the  same  value.  Since  v  is  bounded,  the  convergent 
velocity  is  also  bounded.  When  H  achieves  a  minimum,  V  is  also 
at  its  local  minimum.  From  previous  analysis, 

T«i+...+n;-_i+i  =  •  •  •  =  T«i+. for  j  =  1,2, ...  ,p  (25) 

is  a  local  minimum  for  V.  Hence,  the  system  achieves  the  sta¬ 
ble  state  with  equal  distance  between  adjacent  agents  and  equal 
velocity  for  all  agents  tracking  the  same  spill. 

Consider  a  case  that  after  all  agents  change  to  tracking  state, 
spill  does  not  have  enough  agents  tracking  along  its  boundary, 
i.e.,  n  |  <  We  claim  that  all  agents  tracking  converge  to 

the  same  velocity  and  the  distance  between  each  pair  of  adjacent 
agents  converges  to  a  value  no  less  than  L.  The  energy  of  the 
agents  tracking  is  presented  as: 

Hi(  s1,v1)=V(s1)  +  i£v?  (26) 

L  i=i 

Similarly: 


(e)  (f) 

Figure  5.  SIMULATION  RESULT  WITH  15  AGENTS  AND  ONE  SPILL, 
(a)  SNAP  GRAPH  AT  TIME  STEP  ONE:  (b)  SNAP  GRAPH  AT  TIME 
STEP  600;  (c)  SNAP  GRAPH  AT  TIME  STEP  5000;  (d)  TRAJECTORY 
OF  AGENTS;  (e)  DISTANCE  BETWEEN  EACH  AGENT  AND  THE  CLOS¬ 
EST  AGENT  AHEAD  OF  IT;  (f)  THE  VELOCITY  OF  EACH  AGENT. 


H(si,Vi) 


2  Ln  j  — i 

L  I2 

^ n\ 

-v[i:(si)vi 


1  2 


(27) 


within  its  graph  to  be  L  to  have  a  minimal  energy  also.  There¬ 
fore,  the  distance  between  each  pair  of  adjacent  agents  tracking 
fli  is  not  less  than  L. 


where  Si  =  [si,...,sBl]  and  vi  =  [vi, . . . , vnJ.  Without  loss  of 
generality,  we  assume  that  at  time  t,  L\  is  a  block  diagonal  ma¬ 
trix  containing  two  blocks  L\  \  and  L\i.  In  view  of  the  discussion 
above,  if  no  connection  between  L\\  and  Lyi  holds,  the  velocity 
of  agents  in  L\\  converges  to  v  ,  and  velocity  of  agents  in  L\i 
converges  to  v2.  If  v1  /  v2,  the  two  connected  graphs  will  con¬ 
nect  with  each  other  some  time  since  they  are  tracking  a  perime¬ 
ter.  Then  Eq.  (27)  is  likely  to  be  negative  and  cause  the  energy 
of  the  system  to  decrease  again  until  either  the  graph  splits  into 
several  connected  graphs  or  the  velocity  of  all  agents  converges 
to  the  same  value.  Since  different  velocities  of  agents  in  var¬ 
ious  connected  graphs  cause  a  decrease  of  the  system  energy, 
the  system  energy  achieves  a  local  minimum  when  the  agents 
within  all  connected  graphs  tracking  have  the  same  velocity. 
In  this  case,  the  first  agent  at  the  forward  direction  of  each  con¬ 
nected  graph  will  have  a  leading  virtual  distance  equal  to  L  (i.e., 
the  distance  between  this  agent  and  its  leading  adjacent  agent  is 
larger  than  L),  which  causes  the  distance  between  adjacent  agents 


SIMULATION  AND  RESULTS 

In  this  section,  simulations  are  described  to  test  the  perfor¬ 
mance  of  the  proposed  method  based  on  different  conditions. 
The  influence  of  spill  locations,  size  and  numbers  as  well  as  the 
agent  group  size  are  studied  via  a  number  of  simulations. 

Sensitivity  to  Changes  in  Size  and  Location  of  Spill 

In  this  subsection,  we  design  a  simulation  to  see  how  the 
agent  group  performs  when  the  spill’s  location  and  size  can 
change.  We  will  show  that  our  method  can  search  and  track  the 
spill  successfully.  We  assume  that  15  agents  are  randomly  dis¬ 
tributed  in  the  environment  at  the  beginning.  ‘IT'  is  a  square  arena 
with  sides  of  length  40  meters,  r$  =  6m,  =  8 m,  vmax  =  ‘Hm/s, 
and  At  =  0.02 s.  The  results  of  one  run  are  shown  in  Fig.  5. 
From  the  results,  especially  5(d),  we  can  see  that  the  agents  in  the 
tracking  state  can  follow  the  change  in  the  location  and  size  of 
the  spill.  We  assume  that  for  each  agent  i,  before  changing  to  the 
tracking  state,  L,  equals  zero.  Therefore,  in  Fig.  5(e)  each  jump 
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Figure  6.  SIMULATION  RESULT  WITH  8  AGENTS  AND  ONE  SPILL, 
(a)  SNAP  GRAPH  AT  TIME  STEP  ONE;  (b)  SNAP  GRAPH  AT  TIME 
STEP  320;  (c)  SNAP  GRAPH  AT  TIME  STEP  3500;  (d)  TRAJECTORY 
OF  AGENTS;  (e)  DISTANCE  BETWEEN  EACH  AGENT  AND  THE  CLOS¬ 
EST  AGENT  AHEAD  OF  IT;  (f)  THE  VELOCITY  OF  EACH  AGENT. 


of  Lf  from  zero  indicates  that  one  agent  changes  to  the  tracking 
state.  As  shown  in  Fig.  5(e)  and  Fig.  5(f)  during  the  process, 
when  an  agent  arrives  at  the  spill  and  changes  to  the  tracking 
state,  it  causes  change  in  the  velocity  and  virtual  distance  of  the 
group,  and  after  that  the  agent’s  new  group  starts  to  converge  to 
a  new  stable  state.  After  all  the  agents  are  in  the  tracking  state, 
the  whole  agent  group  converges  to  the  same  velocity  and  with 
the  same  distance  between  adjacent  agents. 

Sensitivity  to  Number  of  Agents 

In  this  subsection,  we  assume  the  spill’s  location,  shape  and 
size  are  fixed,  and  the  purpose  is  to  test  how  the  proposed  method 
performs  when  there  are  not  enough  agents  in  the  environment. 
In  this  simulation,  we  assume  that  W  is  a  square  arena  with 
sides  of  length  30  meters,  r$  =  5m,  r,^  =  8 m,  vmax  =  %m/s,  and 
Af  =  0.02,v.  The  results  of  one  run  are  shown  in  Fig.  6.  From 
the  results  we  can  see  that  the  number  of  agents  is  not  enough  to 
keep  all  agents  in  one  connected  graph  when  tracking  the  spill. 
As  time  goes  on,  all  the  agents  change  to  the  tracking  state,  the 
virtual  distance  of  each  agent  converges  to  L  =  5m,  and  their  ve¬ 
locity  converges  to  about  4.5 m/s.  From  Fig.  6(c)  we  can  see 
that  the  agents  form  two  connected  graphs  with  the  distance  be¬ 
tween  adjacent  agents  in  each  graph  equalling  L.  It  can  be  seen 


Op; - T - 
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°  o.  o-:  - 
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Figure  7.  SIMULATION  RESULT  WITH  CHANGING  SPILL  NUMBER, 
(a)  SNAP  GRAPH  AT  TIME  STEP  ONE;  (b)  SNAP  GRAPH  AT  TIME 
STEP  52;  (c)  SNAP  GRAPH  AT  TIME  STEP  400;  (d)  SNAP  GRAPH 
AT  TIME  STEP  682;  (e)  SNAP  GRAPH  AT  TIME  STEP  690;  (f)  SNAP 
GRAPH  AT  TIME  STEP  5000;  (g)  DISTANCE  BETWEEN  EACH  AGENT 
AND  THE  CLOSEST  AGENT  AHEAD  OF  IT;  (h)  THE  VELOCITY  OF 
EACH  AGENT. 


in  Fig.  6(e)  that  at  about  the  2600th  time  step,  there  is  a  slight 
variation  at  L,  .  It  may  be  caused  by  the  discretization  of  the  time 
in  simulation. 


Sensitivity  to  Change  in  the  Number  of  Spills 

In  this  subsection,  we  build  a  simulation  to  see  how  the  agent 
group  adapts  the  change  of  the  spill  number.  In  this  environment, 
we  assume  that  there  are  two  spills  at  the  beginning.  As  time  goes 
on  the  two  spills  merge  into  one  spill,  and  then  again  split  into 
two  spills.  It  is  assumed  that  the  number  of  agents  is  26,  TF  is  a 
square  arena  with  sides  of  length  40  meters,  r$  =  5m,  =  8 m, 
Vmax  =  8 m/s,  and  Ar  =  0.01s.  The  results  are  shown  in  Fig.  7. 
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From  the  snaps  of  the  simulation  process,  we  can  see  that  the 
agents  successfully  search,  pursue  and  track  the  spills  no  matter 
if  they  are  merged  or  they  split  into  pieces.  After  the  spills  stop 
changing,  the  velocities  of  agents  tracking  the  same  spill  con¬ 
verge  to  the  same  value  as  well  as  maintaining  constant  distance 
between  adjacent  agents. 

CONCLUSIONS 

In  this  paper,  a  hybrid  system  with  feedback  control  strat¬ 
egy  is  advanced  for  a  group  of  agents  with  a  limited  sensor  field 
of  view  (FOV)  and  communication  range  to  search  and  track  a 
number  of  spills  in  an  unknown  environment.  A  hierarchical  po¬ 
tential  field  is  designed  for  agents  in  different  states  to  control 
their  movement.  Simulation  results  show  that  the  agents  can  suc¬ 
cessfully  search  and  track  spills  whose  location,  shape,  size,  and 
number  may  change  over  time. 
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A. 2.9  An  Intelligent  Sensing  Network  of  Distributed  Swarm¬ 

ing  Agents  for  Perimeter  Detection  and  Surveil¬ 
lance 

The  following  paper  (on  the  next  8  pages)  was  co-authored  by  Gregory  K.  Fricke, 
Guoxian  Zhang,  Adam  W.  Caccavale,  S.  Walter  Li,  and  Devendra  P.  Garg,  and 
appeared  in  the  Proceedings  of  the  2010  ASME  Dynamic  Systems  and  Control 
Conference,  Volume  2,  on  pages  741-748,  published  and  presented  in  September 
2010. 
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ABSTRACT 

Perimeter  detection  and  surveillance  by  a  distributed  swarm 
of  autonomous  agents  offers  a  solution  in  multiple  cases  where 
human  safety  or  environmental  protection  is  of  concern.  An  algo¬ 
rithmic  method  is  developed  for  detection  and  sun’eillance  of  one 
or  more  dynamic  perimeters  via  distributed  control  under  local- 
area  sensing  and  communication.  Local  graphs  among  neigh¬ 
bors  are  formed  based  on  proximity,  at  which  point  communica¬ 
tion  among  graph  members  becomes  possible.  Hardware-in-the- 
loop  mixed-simulation  experiments  validate  the  method  utilizing 
iRobot  Create ®  ground  robots  and  a  NaturalPoint  OptiTrack  vi¬ 
sion  system. 


NOMENCLATURE 

a(t)  Linear  acceleration  control  input  at  time  t 
©(f)  Angular  velocity  control  input  at  time  t 
X  (t )  Dynamic  state  vector  for  the  ground  robots 
Is  Set  of  indices  of  agents  in  Search  state 

Ip  Set  of  indices  of  agents  in  Pursuit  state 

It  Set  of  indices  of  agents  in  Track  state 
N  Total  number  of  agents 

Ns(r)  Set  of  agents  influencing  Search-state  agent  at  r 
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Set  of  agents  influencing  Pursuit-state  agent  at  r 
Set  of  agents  influencing  Track-state  agent  at  r 
Repulsive  potential  field  at  r  due  to  robot  i 
Repulsive  potential  field  by  a  Search-state  agent 
Repulsive  potential  field  by  a  Pursuit-state  agent 
Potential  field  by  a  Track-state  agent 
Total  potential  field  acting  on  an  agent  at  r 
2D  coordinates  on  the  plane 
Euclidean  distance  between  two  coordinates 
Influence  range  for  repulsive  potential  functions 
Maximum  communication  range  for  robots 
Range  at  which  robots  can  sense  spills 
Range  at  which  robots  can  sense  each  other 
Discrete-time  sample  index 
Shaping  constants  for  potential  functions 


INTRODUCTION 

A  primary  motivation  for  the  application  of  robotic  technol¬ 
ogy  is  to  replace  humans  in  hazardous  or  repetitive  conditions. 
Such  applications  include  monitoring  the  perimeter  of  forest  fires 
and  land-  and  waterway-based  hazardous  material  spills.  Due 
to  the  hazardous  conditions  present,  fully-distributed  control  of 
independent-but-cooperative  swarming  agents  offers  an  ideal  fit 
in  view  of  its  inherent  redundancy  and  graceful  degradation  [1], 
Detection  and  localization  of  hazardous  materials  has  been  at 
the  forefront  of  robotic  exploration  research  for  many  years  (see, 
e.g.,  [2]  and  included  citations).  Forest  fire  perimeter  monitoring 
by  UAVs  was  explored  by  Casbeer,  Kingston  et  al  [3,4]. 

Additional  perimeter  detection  and  monitoring  research  is 
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reported  by  Clark  etal  [5],  Cruz  etal  [6],  and  Bruemmer  etal  [7], 
Each  of  these  works  make  use  of  potential  functions  for  robotic 
navigation  during  some  part  of  the  algorithm. 

Recent  simulation-based  research  has  been  conducted  [8]  in 
the  context  of  perimeter  tracking  and  surveillance  under  the  con¬ 
dition  that  a  perimeter  may  be  changing  or  even  splitting  into 
multiple  distinct  areas.  The  work  detailed  in  this  paper  is  fo¬ 
cused  on  complementary  experimental  validation  of  the  perime¬ 
ter  tracking  method  developed  in  that  recent  work. 

Many  simplifying  assumptions  are  made  in  the  conduct  of 
simulations.  For  example,  sensor  fields  of  view  (FOVs)  are  as¬ 
sumed  to  be  perfect  circles  with  no  change  in  performance  with 
range.  Communication  is  assumed  to  be  perfect  and  instanta¬ 
neous.  The  robots  are  assumed  to  be  point-sized,  and  collisions 
only  occur  if  robots  exactly  overlap  one  another.  Robot  control 
may  use  a  unicycle  model  taking  into  account  robot  heading,  but 
changes  in  heading  -  regardless  of  magnitude  -  are  accomplished 
in  a  single  time-step  with  no  error. 


SEARCH 
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Spill  detected  by  local 
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Figure  1.  STATE-TRANSITION  DIAGRAM. 


Problem  Statement 

The  goal  of  this  work  is  to  quantify  the  performance  of  N  au¬ 
tonomous  ground  vehicles  performing  perimeter  de¬ 

tection  and  surveillance  in  a  confined  but  unknown  planar  region 
Tf’.  A  set  of  robots  will  be  distributed  in  an  a  priori  unknown 
environment  containing  at  least  one  region  il  with  boundary 
8Q.  The  starting  configurations  of  the  robots  (q,  =  [x,-,  y,,  0,  l  ) 
are  randomized.  The  robots  should  explore  the  environment  in 
search  of  the  region.  Upon  detection,  the  boundary  should  be 
followed.  When  multiple  robots  are  following  the  same  bound¬ 
ary,  they  should  converge  to  the  same  linear  velocity  and  exhibit 
constant  spacing  between  any  pair  of  robots.  If  these  conditions 
are  met,  the  experimental  mission  shall  be  marked  successful. 

Effects  of  various  parameters  on  the  swarm  performance  are 
investigated.  The  parameters  of  interest  in  this  case  are  the  sens¬ 
ing  and  communication  ranges  between  robots  (which  are  sim¬ 
ulated  and  can  be  easily  changed  for  different  experiment  in¬ 
stances  and  on  a  per  robot  basis)  and  the  number  of  robots  par¬ 
ticipating  in  the  task.  The  success  rate  and  time-to-completion 
under  these  parametric  variations  should  be  evaluated  for  notable 
trends. 


APPROACH 

Control  of  the  robots  is  accomplished  via  fully  distributed 
sensing  and  computation.  The  only  explicit  cooperation  is  un¬ 
der  the  sharing  of  information  within  a  robot’s  local  connectivity 
graph.  The  information  yielded  under  this  architecture  of  limited 
communication  may  be  considered  another  type  of  sensing. 

Control  Laws 

The  ground  robots  used  in  this  experiment  are  differential- 
drive  robots  with  a  front-castor  wheel.  The  dynamic  model  for 
such  robots,  frequently  called  unicycle  robots,  is  given  in  Eq.  (1). 


The  control  inputs,  a(t)  and  CO (f),  represent  the  linear  accelera¬ 
tion  and  angular  velocity,  respectively. 
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The  control  law  applied  to  the  robot  (e.g.,  the  construction 
of  a(t)  and  ©(f))  depends  on  the  robot’s  individual  state.  The 
control  modes  are  Search,  Pursuit,  and  Track.  In  the  software 
implementation,  these  modes  are  extended  with  Idle  and  Error 
states  for  practical  considerations.  The  algorithmic  state  transi¬ 
tions  for  an  individual  robot  are  shown  in  Fig.  1 . 

Algorithmically,  the  three  control-oriented  states  are  identi¬ 
cal  in  the  context  of  which  functionalities  are  executed.  How¬ 
ever,  the  data  processing  and  control  laws  are  distinct  for  each 
state.  In  all  states,  collision  avoidance  between  robots  must  be 
ensured.  Thus,  in  all  control  states,  a  repulsive  potential  field 
(see,  e.g.,  [9-11])  is  generated  by  each  robot.  The  extent  to  which 
a  robot  is  affected  by  the  generated  potential  fields  depends  on 
the  priority  level  of  the  robots,  which  is  defined  by  the  robot’s 
current  state.  Robots  in  Track  state  have  the  highest  priority,  and 
thus  will  only  be  repulsed  by  other  robots  in  Track  state.  Robots 
in  Search  state  have  the  lowest  priority,  meaning  that  they  will 
be  repulsed  by  all  robots.  Robots  in  Pursuit  state  are  repulsed  by 
others  in  Pursuit  or  Track  state,  but  not  by  those  in  Search  state. 

For  agent  i  at  location  r,  the  repulsive  field  generated  is  given 
in  Eq.  (2).  The  function  p(.  ,  .)  is  a  vector  difference  function, 
and  po  is  the  range  of  influence  of  the  repelling  field. 


Ui(r) 


if  P(r,r,)<po 
0  if  p(r,r,)  >  po 
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Search  State  A  robot  that  is  in  Search  state,  but  has  not 
detected  a  spill  or  any  other  robot,  will  explore  its  environment 
according  to  a  stochastic  navigation  process.  For  a  given  agent 
j  at  r t  in  Search  state,  the  set  of  agents  within  communication 
range  pc  is  given  by  Ns(rj)  =  {i|Vi,p(r,-,r,-)  <  pc,t  ^  j}. 

When  Ns( r,-)  =  0,  the  navigation  control  is  a  random  ex¬ 
citement  given  in  Eq.  (3).  The  random  variables  b  and  c  are 
defined  by  the  user,  and  are  uniformly  distributed,  respectively. 
The  center  of  the  uniform  distribution  for  b  is  slightly  positive, 
resulting  in  forward  motion  over  time,  whereas  for  c  the  center 
is  0  such  that  there  is  no  preferential  direction  for  the  robots. 
Note  that  these  control  variables  in  discrete  time  map  to  the  con¬ 
tinuous  control  parameters  defined  earlier,  i.e.,  it(k)  ~  ait)  and 
w(k)  ~  co (f). 


Ui(k+  1)  =  Ui[k)  +  b(k)At 

Wj(k+  1)  =  Wi(k)  +  c(k)£t 

Important  sensing  events  relative  to  the  search  task  includes 
the  detection  of  the  spill  or  detection  of  a  neighbor.  If  another 
robot  is  detected,  Ns  ( r; )  becomes  non-empty  and  an  obstacle 
avoidance  component  is  added  to  the  control. 

The  total  repulsive  field  experienced  by  agent  j  for  Ns  /  0  i  s 
thus  the  sum  of  the  fields  of  the  agents  within  range,  as  shown  in 
Eq.  (4). 


U(rj)  =  £  Ui(rj)  (4) 

ieNs 


The  negative  gradient  of  this  summed  field,  Fj  =  —  V iJ  ( r  7j 
yields  the  collision-avoidance  control  input,  where 


Uj(k)  =  \\Fj(k)\\  (5) 


w(k)  =  - 


At 


(6) 


range  is  identified  as  part  of  a  spill.  Alternatively,  a  member  of 
the  robot’s  local  communication  graph  may  have  entered  Pursuit 
(or  Track)  state,  indicating  communicable  knowledge  of  a  spill. 
The  detected  point,  r(/,  is  made  available  to  all  members  of  the 
local  graph.  The  control  law  to  navigate  the  robot  to  this  detec¬ 
tion  point  is  given  in  Eq.  (7). 


Ud{r)  =  ^P{  P(r,rd))2  (7) 

Using  only  this  control  law  would  likely  cause  frequent  col¬ 
lisions,  as  it  is  possible  that  multiple  robots  would  receive  the 
same  detected  point,  Yj,  as  a  goal.  As  those  robots  converge  on 
that  point,  collisions  could  be  difficult  to  avoid.  Thus  collision 
avoidance  very  similar  to  that  of  Eq.  (2)  and  Eq.  (4)  is  calculated, 
except  that  the  summation  is  taken  over  the  set  Np  to  yield  poten¬ 
tial  Up.  The  set  of  neighbors  that  influence  a  robot  in  Pursuit  state 
at  coordinates  r  is  given  byAp(r)  =  {/|V7  £  /pU/r,p(r  ,U)  <  rc}. 
The  total  potential  field  is  then  given  by  U (r)  =  t/f/(r)  +  I/„(r), 
and  the  negative  gradient  Fp{ r)  =  —  VI/ (r)  defines  the  control 
input  just  as  in  Eq.  (5)  and  Eq.  (6). 

In  [8],  the  transition  from  Pursuit  to  Track  state  occurs  when 
the  robot  is  sufficiently  close  to  the  spill.  In  this  experimental 
case  the  robots  are  limited  to  the  downward  looking  IR  sensors 
mounted  on  their  front  bumpers,  implying  that  the  robot  cannot 
detect  the  spill  until  it  is  already  sufficiently  close  to  enter  Track 
state.  Thus  the  time  spent  in  Pursuit  state  in  these  experiments 
will  be  only  one  time-step  if  the  robot  has  directly  detected  the 
spill.  The  cases  where  a  robot  remains  in  Pursuit  state  for  ex¬ 
tended  time  are  when  spill  detection  occurs  by  another  member 
of  the  local  communication  graph. 

If  the  robots  had  cameras  or  forward-looking-IR  sensors, 
their  spill-sensing  range  could  be  extended  to  more  closely 
mimic  the  behavior  simulated  in  [8],  although  the  FOV  in  these 
cases  would  be  limited  to  the  region  in  front  of  the  robot  rather 
than  a  circular  region  around  the  robot.  Alternatively  an  aerial 
observer  could  be  utilized  to  provide  coarse  detection  informa¬ 
tion  regarding  the  spill  to  broadcast  to  the  distributed  robots, 
guiding  their  search.  Both  of  these  situations  are  discussed  later 
in  this  paper. 


For  Ns(yj)  =  0,  the  control  is  governed  by  the  controller  spe¬ 
cific  to  the  state.  In  Search  state,  the  control  is  dictated  by  neigh¬ 
bor  separation:  either  the  control  is  completely  random,  or  is 
dictated  by  the  proximity  of  another  robot. 

If  a  spill  is  detected  by  the  robot  or  another  member  of  the 
local  graph,  the  robot  will  transition  to  Pursuit  state  using  the 
detected  point,  r,/,  as  a  navigational  goal. 

Pursuit  State  In  Pursuit  state,  the  robot  acts  on  some  in¬ 
formation  to  guide  it  to  the  spill.  This  information  may  be  a 
spill-detection  event,  wherein  a  point  within  the  robot’s  sensing 


Track  State  The  Track  state  controller  is  engaged  when 
the  robot  is  sufficiently  close  to  the  spill  perimeter  that  its  own 
sensors  can  closely  follow  the  edge.  Once  in  Track  state,  the 
heading  will  thus  be  determined  by  the  curvature  of  the  perime¬ 
ter  to  be  tracked.  The  problem  of  achieving  balanced  and  dis¬ 
tributed  coverage  of  the  entire  perimeter  leads  to  a  different  po¬ 
tential  function  for  control.  The  control  parameters  of  interest  in 
Track  state  are  the  distances,  i,  to  the  nearest  leading  and 
trailing  neighbors  along  the  perimeter,  as  well  as  the  velocities 
of  the  robot  and  those  neighbors. 

Defining  an  influence  distance  of  L  =  po,  the  distance  L,  to 
be  used  in  the  control  is  defined  as  shown  in  Eq.  (8). 
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Figure  2.  A  REPRESENTATION  OF  THE  REGION  OF  EXPLORATION. 
THREE  ROBOTS  ARE  SHOWN  TO  SCALE. 


f  ii  if  it  <  L 

\  L  if  li  >  L 
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The  virtual  distance  may  then  be  used  to  define  the  controller  for 
the  robot,  as  shown  in  Eq.  (9).  For  derivation  of  this  controller 
from  the  potential  function,  see  [8], 


%(£  —  “gr)  +  (vi+i  —  2v,+v,_i) 

if  Lj  <  L,  Li- 1  <  L 

if  Lj  =  L,  i  <  L 

if  L\  <  Zy,  Li—  i  —  L 

0 

if  Lj  =  Z>,  Li—  i  —  L 

(9) 


Testbed 

The  workspace  for  these  experiments  is  confined  to  a  planar 
region  as  shown  in  Fig.  2.  The  spills  are  simulated  by  covering 
some  area  of  this  region  with  a  material  with  contrasting  color. 
Several  different  configurations  of  spills  are  considered  in  this 
work,  including  convex  and  non-convex  shapes,  shapes  with  dif¬ 
ferent  areas,  and  contiguous  or  non-contiguous  areas. 

The  agents  in  these  experiments  are  iRobot  Create®  robots, 
which  are  differential-drive  robots  with  wheelbase  of  258 mm. 
Control  of  these  robots  is  accomplished  via  serial  communica¬ 
tion  utilizing  iRobot’s  “Open  Interface”  specification.  The  Open 
Interface  specification  allows  access  to  all  of  the  robot’s  sensor 
data.  These  robots  have  an  external  diameter  of  35 cm;  in  the 
current  implementation,  the  calculation  of  avoidance  potential 
functions  does  not  take  this  into  account,  so  the  robots  do  oc¬ 
casionally  collide  (as  indicated  by  their  bump  sensors).  When  a 
bump  occurs  (with  an  obstacle  or  a  bump),  the  robots  back  up 
and  turn  a  small  amount. 


•  % 


Sync:  Hardware 


5 


Figure  3.  PERSPECTIVE  FROM  CAMERA  #5  WITHIN  THE  SYSTEM. 


The  robots  are  wirelessly  controlled  via  an  off-board  com¬ 
puter.  The  robots  utilize  the  Bluetooth  Adapter  Module  from 
Element  Direct,  made  specifically  for  Create  robots.  In  this  ar¬ 
chitecture,  the  decision-making  processes  of  all  of  the  robots  can 
be  accomplished  on  the  same  physical  computer,  simplifying  the 
development  process.  The  processes  remain  distinct,  however, 
and  the  control  could  equivalently  be  physically  distributed  to 
distinct,  dedicated  computers. 

The  spill  sensors  utilized  are  the  Create’s  built-in  down¬ 
looking  IR  proximity  sensors,  called  “cliff  sensors”.  The  cliff 
sensors  are  used  in  the  nominal  operational  functions  provided 
on  the  robot  as  safety  logic  to  detect  stairs  or  other  drop-offs  to 
be  avoided.  For  this  experiment,  the  full-range  sensor  data  will 
be  utilized  rather  than  the  Boolean  data,  allowing  calibration  for 
different  “colors”  of  spills. 

A  simulated  sensor  for  each  robot  is  provided  by  the  arena’s 
embedded  motion  capture  system.  With  the  ability  to  track  each 
robot’s  position  and  orientation,  the  motion  capture  system  yields 
GPS-like  localization  information  as  well  as  nearest-neighbor 
data.  The  motion  capture  system  from  NaturalPoint,  called  Opti¬ 
Track,  is  comprised  of  active-illumination  infrared  cameras  and 
a  software  package,  providing  6-DOF  data  for  defined  sets  of 
markers  within  the  field  of  view.  Each  robot  is  equipped  with 
three  reflective  markers  (each  with  a  different  pattern  for  identi¬ 
fication;  see  [12, 13]  for  alternative  solutions)  to  provide  full  6-D 
measurement  capability.  A  sample  active-IR  image  of  multiple 
robots  in  the  field  is  shown  in  Fig.  3.  The  markers  are  clearly 
visible  in  this  full-frame  image,  as  are  the  spill  and  robot  shapes. 
The  robots  in  this  experiment  are  constrained  to  a  plane,  thus  the 
out-of-plane  distance  measurement  and  the  roll  and  pitch  orienta¬ 
tion  measurements  are  unnecessary  (and  should  be  unchanging). 

Due  to  the  lack  of  available  sensors  on  the  Create  to  sense 
neighbors,  the  motion  capture  system  provides  an  external  sur¬ 
rogate.  Upon  capturing  a  frame  of  data,  the  system  computes 
all  robot-to-robot  distances.  Based  on  an  input  parameter  (pc  in 
later  sections),  the  system  applies  a  threshold  to  determine  which 
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Figure  4.  DIAGRAM  OF  PHYSICAL  AND  VIRTUAL  BOUNDARIES  IN 
THE  MULTI-ROBOT  CONTROL  ARCHITECTURE. 

robot  can  “hear”  each  other,  e.g.,  those  robots  who  are  within 
communication  range.  The  robots  individually  access  their  own 
connectivity  data  from  the  motion  system,  and  can  then  “receive” 
messages  from  those  robots.  This  communication  mapping  de¬ 
fines  the  robots’  local  graphs.  The  motion  capture  system  addi¬ 
tionally  applies  the  user-specified  pr  to  determine  which  robots 
can  “see”  each  other. 

A  diagram  of  virtual  and  physical  boundaries  is  given  in 
Fig.  4.  Note  that  while  all  of  the  processing  occurs  on  the  same 
computer,  the  processes  are  completely  independent  of  one  an¬ 
other  except  for  the  simulated  broadcast  messages. 

EXPERIMENTS 

The  experimental  environment  may  be  described  as  mixed- 
simulation ,  referring  to  the  use  of  real  hardware  and  real  sen¬ 
sors  combined  with  some  simulated  components.  The  simulated 
components  are  intended  to  allow  greater  configuration  control 
and  also  reduce  the  financial  burden  and  system  integration  and 
development  time  required. 

Setup 

Each  experiment  utilizes  between  three  and  five  autonomous 
ground  vehicles  and  one  or  two  simulated  spills.  The  “spills” 
in  this  experiment  are  formed  from  black  material  laid  on  the 


Figure  5.  A  WORKSPACE  WITH  FIVE  IROBOT  CREATES,  ONE  NON- 
CONVEX  SPILL,  AND  THE  CONTROL  COMPUTER. 

floor.  Two  different  shapes  are  used,  and  other  shapes  are  con¬ 
structed  by  allowing  spills  to  overlap.  In  Fig.  5,  five  Create 
robots  and  one  non-convex  spill  are  visible,  alongside  the  “su¬ 
pervisor/switchboard”  computer.  Note  that  due  to  the  sensor  con¬ 
figuration  on  the  Create  robots,  the  “tracking”  portion  of  the  al¬ 
gorithm  becomes  an  edge-following  problem.  Experiments  with 
changing  spill  contours  are  conducted  by  manipulating  one  piece 
of  material  in  an  overlapped  configuration. 

For  the  experiments  reported  here,  the  communication  range 
is  pc  =  2 m,  and  the  avoidance  influential  range  po  and  track-state 
influence  range  are  po  =  L  =  hn. 

Results 

Results  are  presented  from  several  experimental  sets.  The 
experiment  shown  in  Fig.  6  utilizes  five  robots  and  a  shrinking 
spill.  Initially,  the  spill  is  configured  in  an  F  shape,  similar  to  the 
shapes  visible  in  Fig.  2  and  Fig.  5.  Once  all  robots  are  in  Track 
state,  the  small  leg  of  the  spill  is  manipulated  to  condense  the 
spill  into  a  large  rectangle.  The  perimeter  of  the  large  rectangle 
is  roughly  5.6 m,  larger  than  N  *L,  so  the  robots  cannot  achieve 
full  equilibrium,  although  their  velocities  do  converge. 

The  experiment  shown  in  Fig.  7  utilizes  three  robots  and 
only  the  small  spill.  The  robots  in  this  case  quickly  converge 
in  Track  state  once  the  first  robot  finds  the  spill.  Due  to  the  ge¬ 
ometry  of  the  spill  relative  to  the  non-zero  size  of  the  robots  and 
spurious  communication  dropouts  (during  which  the  robots  stop 
for  safety),  the  velocities  do  not  converge  asymptotically.  How¬ 
ever,  the  velocity  variation  is  clearly  centered  about  a  nominal 
velocity,  and  the  robot-robot  spacing  never  deviates  from  equiv¬ 
alence  significantly. 

In  these  experiments,  the  communication  is  via  Bluetooth, 
which  becomes  unreliable  for  four  or  more  robots  communicat¬ 
ing  with  the  switchboard.  As  a  safety  precaution,  the  robots  stop 
driving  when  Bluetooth  communication  is  (temporarily)  lost; 
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Figure  6.  EXPERIMENTAL  RESULT  FOR  FIVE  ROBOTS  WITH  A 
SHRINKING  SPILL.  (A)  FULL  TRAJECTORIES  OF  ALL  ROBOTS,  WITH 
SNAPSHOT  LOCATIONS  AT  THE  MIDPOINT  OF  THE  RUN-TIME.  (B) 
LINEAR  SPEEDS  OF  ALL  AGENTS. 
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Figure  7.  EXPERIMENTAL  RESULT  FOR  THREE  ROBOTS  WITH  THE 
SMALLER  SPILL.  (B)  FULL  TRAJECTORIES  OF  ALL  ROBOTS,  WITH 
SNAPSHOT  LOCATIONS  AT  THE  MIDPOINT  OF  THE  RUN-TIME.  (B) 
LINEAR  SPEEDS  OF  ALL  AGENTS. 


these  occasional  stops  prevent  the  robots’  velocities  from  con¬ 
verging  asymptotically  (as  predicted  by  the  algorithm).  The  ve¬ 
locities  do,  however,  reach  a  stable,  bounded  value. 

Continuing  Work 

The  arena  workspace  is  currently  in  transition  to  utilize  di¬ 
rect,  on-board  control  of  the  robots.  Dell  MinilO  netbook  com¬ 
puters  operate  the  Create  robots  and  communicate  with  the  cen¬ 
tral  computer  via  wireless  ethernet  connections.  Along  with  the 
on-board  computers,  the  Create  robots  are  now  each  equipped 
with  a  forward-looking  Logitech  webcam  and  a  Hokuyo  scan¬ 
ning  laser  rangefinder.  The  webcam  provides  a  means  for  the 
Create  to  extend  its  spill-detection  capability  by  employing  tools 
from  the  OpenCV  [14,  15]  machine-vision  software  package. 
The  laser  rangefinder  will  be  used  to  provide  obstacle  avoidance 


and  direct  detection  of  neighboring  robots,  while  concurrently 
mapping  the  environment.  Additional  discussion  of  the  multi¬ 
robot  map-matching  problem  can  be  found  in  [16]). 

Fig.  8  shows  the  latest  configuration  of  the  robots  includ¬ 
ing  camera,  laser,  and  netbook.  Preliminary  experiments  con¬ 
ducted  with  this  new  robot  configuration  are  shown  in  Fig.  9 
and  Fig.  10.  Note  the  improved  convergence  of  velocities  due 
to  the  decrease  in  data  dropouts  with  the  on-board  netbooks.  In 
Fig.  10,  the  velocities  of  some  robots  briefly  increase  (around 
k=700)  when  the  smaller  spill  rejoins  the  larger  spill  and  shrinks. 
The  re-introduction  of  the  fifth  robot  into  the  graph  causes  a  dis¬ 
ruption  to  the  equilibrium,  which  quickly  settles  out.  Note  also 
that,  in  TRACK  mode,  the  agents  are  subjected  to  a  minimum 
speed  (in  this  case,  Vmin  =  50 mm/ s)  to  prevent  the  swarm  from 
converging  to  a  velocity  that  is  too  near  to  zero  to  be  realized. 

Parallel  development  is  focused  on  the  use  of  Gum- 
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Figure  8.  LATEST  CONFIGURATION  OF  CREATE  ROBOTIC  AGENT, 
USING  DELL  MINI10  NETBOOK,  LOGITECH  QUICKCAM  PRO  9000, 
AND  HOKUYO  URG-04LX-UG01  SCANNING  LASER  RANGEFINDER. 


Stix  Verdex  small-form-factor  computers  controlling  the  Create 
robots  via  a  Sticky  Interface  from  Element  Direct.  The  Gumstix 
computers  have  both  Bluetooth  and  WiFi  communication  abil¬ 
ity,  allowing  direct  robot  communication  without  switchboard¬ 
ing.  This  interface,  along  with  a  small  USB  hub  powered  off  of 
the  Create  battery,  likewise  allows  the  use  of  the  webcam  and 
laser  rangefinder  sensors  as  described  above. 

This  lab  has  recently  acquired  two  Skybotix  CoaX  heli¬ 
copters  with  Gumstix  Overo  computers  on-board.  The  heli¬ 
copters,  equipped  with  USB  cameras,  will  be  integrated  with  this 
system  to  provide  overhead  observation  of  larger-scale  areas,  al¬ 
lowing  extension  of  the  ground  robots’  sensing  ranges.  Simula¬ 
tion,  analysis,  and  planning  of  this  system  is  underway  for  the  ap¬ 
propriate  position  control  of  the  helicopters  to  provide  maximum 
utility  to  the  system,  drawing  some  inspiration  from  [17, 18]. 

To  aid  in  the  development  of  the  aerial  tracking  capability, 
this  lab  is  developing  a  more  indoor-safe  platform.  Based  on 
a  radio-controlled  blimp,  this  overhead  system  utilizes  Gumstix 
Verdex  and  Robostix  boards.  The  Verdex  computer  manages  the 
control  and  communication  with  the  system,  while  the  Robostix 
interfaces  with  a  lightweight  webcam  and  PING  ultrasonic  range 
sensors  for  navigational  control  and  altitude  maintenance. 

Use  of  the  autonomous  aerial  vehicles  as  surveyors  will 
yield  direct,  noisy  measurements  of  perimeter  length  and  number 
of  spills,  as  well  as  coarse  information  about  the  ground  robot  po¬ 
sitions.  This  information  could  be  utilized  to  actively  manage  the 
number  of  agents  participating  in  the  surveillance  of  the  perime¬ 
ters;  for  example,  some  ground  vehicles  could  be  maintained  in 
an  “On-call”  state,  not  expending  energy  until  it  is  needed  for 
perimeter  coverage. 


Snapshot  of  the  agents 


(A) 

agent  linear  speed 


Figure  9.  EXPERIMENTAL  RESULT  FOR  FIVE  ROBOTS  WITH  A  STA¬ 
TIONARY  LARGE  SPILL.  (A)  FULL  TRAJECTORIES  OF  ALL  ROBOTS, 
WITH  SNAPSHOT  LOCATIONS  AT  THE  MIDPOINT  OF  THE  RUN-TIME. 
(B)  LINEAR  SPEEDS  OF  ALL  AGENTS. 


CONCLUSION 

A  fully  distributable  control  methodology  for  detection 
and  monitoring  of  dynamic  perimeters  has  been  developed  and 
demonstrated.  This  control  paradigm  has  been  implemented  in  a 
mixed-simulation  environment  utilizing  independent  ground  ve¬ 
hicles  with  limited  communication.  The  practical  issues  related 
to  such  a  system  have  been  addressed,  and  work  is  continuing  to 
develop  a  completely  independent  swarm  that  does  not  rely  on 
the  external  motion-tracking  system. 
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Snapshot  of  the  agents 
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Figure  10.  EXPERIMENTAL  RESULT  FOR  FIVE  ROBOTS  WITH  A 
LARGE  SPILL  THAT  GROWS,  SPLITS  INTO  TWO  SPILLS,  THEN  COL¬ 
LAPSES  TO  ONE  LARGE  SPILL.  (A)  FULL  TRAJECTORIES  OF  ALL 
ROBOTS,  WITH  SNAPSHOT  LOCATIONS  AT  THE  MIDPOINT  OF  THE 
RUN-TIME.  (B)  LINEAR  SPEEDS  OF  ALL  AGENTS. 
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ABSTRACT 

Motivated  by  the  close  relation  between  estimation  and  con¬ 
trol  problems,  we  explore  the  possibility  to  utilize  stochastic  sam¬ 
pling  for  computing  the  optimal  control  for  a  large-size  robot 
population.  We  assume  that  the  individual  robot  state  is  com¬ 
posed  of  discrete  and  continuous  components,  while  the  popula¬ 
tion  is  controlled  in  a  probability  space.  Utilizing  a  stochastic 
process,  we  can  compute  the  state  probability  density  function 
evolution,  as  well  as  use  the  stochastic  process  samples  to  eval¬ 
uate  the  Hamiltonian  defining  the  optimal  control.  The  proposed 
method  is  illustrated  by  an  example  of  centralized  optimal  con¬ 
trol  for  a  large-size  robot  population. 

NOMENCLATURE 

R,  R"  set  of  real  numbers,  set  of  real  number  vectors  of  dimen¬ 
sion  n. 

Q  set  of  discrete  states,  i.e.,  integer  indexes  {1,2,3, . . .} 

Uad  set  of  admissible  control. 

p(x,t)  probability  density  function  (PDF)  of  the  hybrid  state  at 
time  1.  This  variable  is  a  vector  of  functions;  it  depends 
on  x  £  X  and  t  €  R,  but  jr  and  t  are  frequently  omitted 
in  expressions. 

p i(x,t)  the  PDF  component  corresponding  to  the  discrete  state 

i,  i  £  Q- 

n  (xf)  the  adjoint  state  distribution. 

7t  the  discrete  approximation  of  the  adjoint  state  distribution. 

<| )(x,t)  the  adjoint  state  PDF. 

Pfx.t)  the  probability  of  the  discrete  state  i,  i  £  Q. 

Pf(x.t)  the  probability  of  the  discrete  adjoint  state  ;,  i  £  Q. 

Ft  the  transition  rate  matrix. 
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Xjj  the  component  of  the  transition  rate  matrix  \Ft]ij  =  Xjj. 

Fu  the  transition  rate  matrix  that  depends  on  control  vector  u, 
U  £  Uad- 

Fr)  the  component  of  the  linear  operator  F  corresponding  to  the 
vector  fields  f  of  discrete  states  i  £  Q. 

//(p,«,f)  the  PDF,  the  control  and  time  dependent  Hamilto¬ 
nian,  p,  u  and  t  are  frequently  omitted. 
u*  the  optimal  control 

Ep  the  expected  value  with  respect  to  the  state  PDF  p. 

INTRODUCTION 

Solutions  of  multi-robot  control  problems  may  be  of  enor¬ 
mous  complexity  because  of  the  operating  environment  uncer¬ 
tainties,  or  a  large  number  of  redundant  states  and  robots.  For 
many  years  it  has  been  known  that  the  optimal  control  and  op¬ 
timal  estimation  problems  are  closely  related  [1].  For  example, 
the  linear  quadratic  regulator  (LQR)  and  the  Kalman  filter  (KF) 
estimator  can  be  derived  in  the  same  optimal  control  framework. 
Having  this  in  mind,  it  is  expected  that  estimation  methods  based 
on  statistical  sampling,  and  employed  for  solving  complex  esti¬ 
mation  problems  [2],  can  contribute  to  solving  complex  control 
problems  for  a  single  and,  more  importantly,  multi-robot  systems 
under  the  presence  of  uncertainty. 

Along  this  idea,  Kappen  et.  al.  [3, 4]  used  stochastic  dif¬ 
ferential  equations  to  model  individual  agents.  Based  on  this 
description,  it  is  possible  to  relate  the  Hamilton-Jacobi-Bellman 
partial  differential  equation  with  samples  of  the  stochastic  pro¬ 
cess  trajectories  and  use  the  samples  to  define  the  stochastic  op¬ 
timal  control  of  multi-agent  systems.  In  this  framework,  the 
state  of  individual  robots  is  continuous.  However,  the  state  of 
real  robots  is  generally  described  by  a  combination  of  contin- 
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uous  and  discrete  variables,  i.e.,  by  a  hybrid  state.  Therefore, 
it  is  more  natural  to  describe  the  robot  behavior  using  a  hybrid 
automaton  |5].  The  automaton  describes  the  discrete  and  con¬ 
tinuous  variables  change  in  time,  which  depends  on  events  in¬ 
fluencing  the  robot  behavior.  In  the  case  of  a  large-size  multi¬ 
robot  population,  it  becomes  highly  complex  to  predict  events 
from  the  robot  local  environment.  Because  of  that,  we  model 
a  large-size  robot  population  considering  the  stochastic  hybrid 
model  and  study  how  it  can  be  controlled. 

In  this  paper,  we  consider  a  problem  in  which  the  presence 
of  a  large-size  robot  population  in  a  desired  region  of  operat¬ 
ing  space  is  maximized.  This  problem  is  formulated  in  a  hybrid 
system  framework  in  [6].  Its  solution,  based  on  flic  minimum 
principle  for  partial  differential  equations,  is  presented  in  [7, 8], 
and  it  is  solved  numerically  when  the  presence  of  the  robots  is 
maximized  along  one  dimension  (ID). 

The  Hamiltonian,  which  defines  the  optimal  control,  in¬ 
cludes  integral  terms  that  depend  on  the  solution  of  a  system  of 
partial  differential  equations  (PDE).  This  system  of  PDEs  is  in 
general  difficult  to  evaluate  and  the  numerical  evaluation  of  in¬ 
tegrals  is  prone  to  errors.  However,  we  recognize  that  the  prob¬ 
lem  solution  can  be  simplified  and  propose  to  use  samples  of 
the  stochastic  processes  to  evaluate  the  Hamiltonian  components 
from  the  expected  values  of  the  adjoint  state  distribution. 

The  direction  of  the  research  we  are  pursuing  is  consider¬ 
ably  different  from  the  stochastic  optimal  control  work  presented 
in  [9].  There,  stochastic  processes  have  been  used  only  as  an  an¬ 
alytical  tool  to  map  the  stochastic  process  to  be  controlled  into 
the  finite  state  space,  in  which  the  optimization  is  performed. 
The  benefit  of  using  a  solution  based  on  sampling,  i.e,  compu¬ 
tational  statistical  methods,  is  that  control  problems  in  robotics 
could  be  solved  faster.  This  possibility  also  depends  on  the  abil¬ 
ity  to  implement  sampling  and  computations  with  samples  into 
the  processor  computing  the  control. 


MODELING  AND  CONTROL  FRAMEWORK 

In  the  modeling  framework  we  consider,  the  stale  of  an 
individual  robot  at  lime  t  is  uniquely  defined  by  the  couple 
{x(l).q(l)),  x  €  X,  X  €  R",  q&Q.Q  {1,2, ..W}.  While  in 
the  discrete  state  (mode)  k  E  Q,  the  continuous  state  of  a  robot 
obeys  the  differential  equation  x  fk(x,t).  We  also  assume  that 
switching  among  the  discrete  states,  say  from  the  slate  k  £  Q  to 
the  state  j  e  Q,  (k  ^  j),  is  described  by  stochastic  transition  rates 
Xi ij,  and  that.v(i)  is  a  continuous  function  of  time.  In  other  words, 
the  continuous  state  just  before  the  discrete  state  transition  ,v (t  ) 
is  equal  to  the  state  ,v(/+ )  after  the  state  transition.  This  very  gen¬ 
eral  model  of  an  individual  robot  is  illustrated  in  Fig.  1  and  the 
modeling  framework  we  are  applying  here  is  detailed  in  |8|. 

Recognizing  dial  the  state  of  an  individual  robot  is  com¬ 
posed  of  discrete  and  continuous  components,  the  slate  proba¬ 
bility  density  function  (PDF)  is  a  vector  of  functions  p(.vj) 

[pi  (a%  r)  p2 (-V, / ) . .  .pw(.v./)]'  [81.  Each  component  p,(.v,f)  cor¬ 
responds  to  the  discrete  state  /,  and  the  symbol  (’)  denotes  the 


<?=-? 


Figure  1 .  STOCHASTIC  HYBRID  AUTOMATON  MODEL  OF  A  ROBOT 
IN  A  PROBABILISTIC  FRAMEWORK:  DISCRETE  STATE  q\  CONTINU¬ 
OUS  STATE  X  VECTOR  FIELD  fk,  k  €  Q  DESCRIBES  THE  CHANGE 
OF  THE  CONTINUOUS  STATE;  STOCHASTIC  TRANSITION  RATES 
Xkj,  kj  €  Q  DESCRIBE  THE  MODE  SWITCHING;  y  IS  THE  MEASUR¬ 
ABLE  OUTPUT;  IF  THE  FULL  CONTINUOUS  STATE  OF  THE  ROBOT 
IS  MEASURABLE,  C  IS  THE  UNITY  MATRIX. 

vector  transpose.  The  state  PDF  satisfies 
£  f  Pi(x,l)dx=  Ytpi(‘)  =  1-  where  0(0  =  f  P>(x,t)dx 

iZQJX  ;€q  Jx 

(1) 

where  Pi(l)  is  the  probability  of  the  discrete  state  /  at  the  time 
point  t.  Let  us  define  the  vector  of  discrete  probabilities  P(t) 

[Pi  (r), ft  (/),.. . /^(tf)]*,  then  evolution  of  the  probability  vector 
is  given  by: 

P(t)  =  F,(i)P(t),  where  [F,[y  =  Xu(t)  (2) 

with  matrix  F,  defining  the  transition  rates  among  the  discrete 
states.  In  general,  the  correspondence  between  the  matrix  F, 
members  [/>]y  and  the  transition  rates  X,j  is  not  one-to-one. 
Assuming  that  the  transition  rates  depend  on  the  vector  u(t) 
[iq(f)  1/2(0  . ..«A/(0r  °f  variables  1,2... M,  we  can  de¬ 

fine  the  transition  rate  matrix  as  a  function  of  the  vector  «(/),  i.e., 
F,(t)  =  Fu(u(r)).  Consequently,  the  vector  of  the  discrete  state 
probabilities  obeys  [10]: 

P(r)  =  Fu(u)P(t )  (3) 

Moreover,  it  can  be  proven  [8]  that  the  state  PDF  obeys  the  fol¬ 
lowing  system  of  partial  differential  equations  (PDE): 

=  F(M)p(.v,0  =  (Fu(U(0)  +Fd)p(x,t)  (4) 

where  F()  is  the  diagonal  linear  differential  operator.  When  the 
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operator  Fg  is  applied  to  p(x,f),  it  results  in: 


The  function  vector  % (x,t)  is  the  so-called  adjoint  state  distribu¬ 
tion  and  satisfies: 


[F3p(x,f)],;  = 


-V  -(fiPi(x,t)),i  =  j 

0)  *  7^  j 


(5) 


Taking  into  account  that  the  state  PDF  evolution  p(x,f)  de¬ 
pends  on  the  vector  u(t),  we  can  formulate  the  optimal  control 
problem  in  the  probability  space  using  the  cost  function: 

/  =  [  w,(x)p(x, T)dx  (6) 

Jx 


=  -F'(u)n(x,t)  (12) 

7t(x,  F)  =  —  w(x)  (13) 

The  major  difficulty  in  computing  the  optimal  control  is  in 
evaluation  of  integrals  (11)  and  corresponding  PDE  system  solu¬ 
tions  (4)  and  (12).  Based  on  the  definition  of  the  scalar  product 
(11),  the  Hamiltonian  can  be  expressed  as: 


In  this  respect,  the  optimal  control  problem  is  the  optimization 
problem: 

u*{t)=  max  /=  max  /  w(x)p(x, T)dx  (7) 

{uj  •'  X 

Alternatively,  to  avoid  the  singular  control  problems  [8],  we  can 
consider  the  optimal  control  that  includes  the  term  penalizing  the 
control: 


//(p(x, /),«,/)  =  (p  {x,t),F'(u)%{x,t))  (14) 

’1-e”  H(p(x,t),u,t)  =  YiJ^pi(x,t)[F\u)n{x,t)]idx  (15) 

where  [•],  denotes  the  zth  row  of  the  vector.  In  the  following 
section,  we  will  explain  how  the  evolution  of  the  state  PDF  p, 
as  well  as  expression  (15)  can  be  computed  using  the  stochastic 
sampling  propagator. 


u*(t)=  max  J=  max  /  w(x)p(x, T)dx  +  E  /  u'(t)u'(t) 
u(l)^uad  u{t)eUadJx  Jo 

(8) 

Anyway,  the  solution  of  this  problem  is  a  sequence  of  the  opti¬ 
mal  control  u*(t),  from  the  set  of  admissible  control  Uad,  such 
that  the  cost  function  is  maximized.  By  a  suitable  choice  of  the 
weighting  function  w(x),  the  cost  function  can  be  used  to  find  the 
optimal  control  maximizing  probability  of  the  robot  presence  in 
the  desired  region  of  the  robots’  operating  space. 

The  optimal  control  maximizing  the  criterion  (6)  is  a  special 
case  of  a  more  general  optimal  control  problem  of  the  evolu¬ 
tion  equation  [11],  Under  the  condition  that  the  operator  F(m)  is 
bounded,  i.e.,  ||F(z<(f))||  <  °°,  the  minimum  principle  for  PDEs 
can  be  applied  [11],  According  to  the  minimum  principle,  the 
optimal  control  u*(t)  satisfies: 

u*(t)  =  arg  min  H(p(x,t),u(t)1t)  (9) 

ueuad 

In  other  words,  for  the  optimal  state  PDF  trajectory  p*(x,f),  the 
optimal  control  minimizes  the  Hamiltonian  at  each  time  point. 
The  Hamiltonian  is: 

H(p(x,t),u,t)  =  {n{x,t),F(u)p{x,t))  (10) 

where  brackets  (•,  •)  denote  the  scalar  product  of  function  vectors 
defined  as: 

(p(x),q(x))  =  /  p\x)q(x)dx  =  /  Ypi(x)qi(x)dx  (11) 
Jx  JXj 


STOCHASTIC  SAMPLING  PROPAGATOR 

The  evolution  of  the  large-size  population  probability  den¬ 
sity  function  p(x,f)  is  described  by  the  PDE  system  (see  Eq.4). 
One  way  to  obtain  the  evolution  p(x,f)  is  to  solve  the  PDE  sys¬ 
tem  forward  in  time  starting  from  the  initial  condition  p(x,0)  = 
p°(x).  We  propose  an  approach  to  computing  the  evolution 
p(x,f)  based  on  stochastic  trajectories  of  the  hybrid  state  ( x,q ) 
evolution  resulting  from  the  model  presented  in  Fig.  1.  To  ac¬ 
count  for  the  fact  that  the  transition  rates  can  change  in  time,  we 
assume  that  the  control  is  a  piecewise  constant  function  of  time 
discretized  with  the  sample  time  AT.  The  basis  for  the  proposed 
algorithm  is  the  Gillespie’s  stochastic  simulation  algorithm  [12], 
To  generate  the  trajectory  of  (x.q),  we  need  to  generate  the 
initial  state  (x(0),g(0))  from  the  state  PDF  p(x,0)  =  p°(x).  Prob¬ 
ability  Pj(t)  of  q(t)  =  i  is: 

Pi(t)  =  /  p i(x,t)dx  (16) 

Jx 

Therefore,  the  random  variable  q{ 0)  =  i  should  be  gener¬ 
ated  from  the  discrete  state  probability  distribution  repre¬ 
sented  by  the  vector  of  discrete  state  probabilities  P(0)  = 
[Pi  (0)  P2 (0)  . . . Piv(0)]/.  Symbolically,  we  will  represent  it  as: 

q(0)=i~P(0)  (17) 

Once  the  initial  discrete  q(0)  state  is  defined,  the  continuous  vari¬ 
able  x(0)  can  be  generated  from  the  corresponding  p,  (x,0)  com¬ 
ponent  of  the  state  PDF  ,i.e.,  from  the  probability  P  of  x(t)  given 
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that  q(t)  =  i  and  t  =  0: 


Following,  the  expected  value  of  [F'(tf)7r(x,?)li  under  the  condi¬ 
tion  that  the  discrete  state  q(t)  =  i  is 


x(0)  ~  <P{x\q(t)  =  i,t  =  0}  =  pi(x,0) / Pi(0)  (18) 

Whenever  the  discrete  slate  is  q(t)  =  i,  the  evolution  of  the 
continuous  state  x  obeys  x  =  f,(x).  Therefore,  generating  trajec¬ 
tory  ( x(t),q(i ))  reduces  to  the  problem  of  generating  the  state 
transitions  of  the  discrete  state  q(t).  Let  us  assume  that  at  time 
l  —  Jj ,  ts  £  [(A  —  l)  AT,  kAT),  the  hybrid  state  is  ( x(t),q(l))\  then, 
the  time  instant  at  which  the  state  changes  tc  can  be  generated 
based  on  the  following  two  rules: 

(a)  t c  =  ts  +  it ,  tt  ~  e  under  the  condition  that 

tc  <  kAT.  If  the  condition  is  not  satisfied,  apply  rule  (b). 

(b)  tc  =  kAT  +/,,  t,  ~  under  the  condition  that 

tc  <  (k  +  1  )AT.  If  the  condition  is  not  satisfied,  increase  k 
by  1 .  Apply  rule  (b)  until  the  condition  is  satisfied. 


E{F'(u)K{x,t)\q{t)  =  i}  =  -jsjjjj Pi{x,t)\F'(u)n(x,t)\idx 

(22) 

and  can  be  approximated  as: 

£{F'(M)7C(.V,0|9(/)  =  i}  «  ^|F'(h)k(Jc*,/)]/8(9*(/)-0 

(23) 

where  8 (#*(/)  —  i)  =  1  if  qn(t)  =  i ,  and  zero  elsewhere.  Conse- 
qently,  the  Hamiltonian  (15)  can  be  expressed  as: 

H  {p{x,t),u,t)  =  Y.lPi(‘)E{F'(u)%(x,t)\q(t)  =  /}  (24) 


These  two  rules  define  the  time  point  tc  at  which  the  jump  from 
the  discrete  state  /  to  the  discrete  state  j  happens,  but  do  not 
specify  the  variable  j.  The  state  j  needs  to  be  sampled  from  the 
discrete  stale  probability  density  function,  i.e.,  from  the  proba¬ 
bility  tP  of  q(t H)  /,  given  that  q(t)  i  provided  in  the  vector 
of  the  discrete  probability  distribution  with  N  —  1  elements: 


j  ~  <P{q{t)\q{t  )  =  /}  = 


N- 1 

(19) 

The  above  algorithm  can  be  used  to  generate  a  single  trajec¬ 
tory  for  the  stochastic  model  shown  in  Fig.  1 .  In  the  limit  of 
a  large  number  of  samples,  the  normalized  density  of  trajectory 
points  will  correspond  to  the  solution  of  the  PDE  system  given 
by  Eq.  4.  In  this  respect,  the  stochastic  simulation  is  a  compu¬ 
tational  propagator  of  the  evolution  p(.v./)  and  we  can  denote  it 


as: 


ap 

dt 


Fsim  (w(f  ))p 


(20) 


HAMILTONIAN  EVALUATION 

Let  us  assume  that  the  total  number  of  the  trajectories  we  use 
to  propagate  the  state  PDF  p  is  Nsampy  that  x^t)  and  q^t)  denote 
the  continuous  state  and  discrete  states  of  trajectory  k  at  time  /, 
k  =  1 . . .  Nsamp  ■  At  a  given  time  point  t,  among  Nsamf,  trajectories 
only  Nj(t)  trajectories  arc  in  the  discrete  slate  i,  and  naturally, 
£,/V,(/)  =  Nsamp,  Vr.  From  the  state  PDF  normalizing  condition 
(1),  we  can  conclude  that: 

W)LMx’,)=' 


In  the  limit  of  a  large  number  of  samples  P,(/)  =»  Nj{t)/Nsamp\ 
therefore,  the  Hamiltonian  value  can  be  estimated  using  die  fol¬ 
lowing  expression: 


H(p(x,t),u,t)  =  -5(^(0  -0  (25) 

Wsamp  j  k 

where  denotes  the  sum  over  all  trajectories  and  over  all 
discrete  states.  This  expression  is  exact  in  the  limit  of  a  large 
number  of  samples  Nsamp.  To  illustrate  and  verify  the  algorithm 
lor  generating  stochastic  trajectories  (x(t).q(t))  and  computing 
the  Hamiltonian  components,  we  use  a  1 D  example  in  the  next 
section. 


ID  EXAMPLE 

The  stochastic  model  presented  in  Fig.  2b  illustrates  the 
stale  PDF  evolution  of  a  large-size  robot  population  along  one 
dimension  (Fig.  2a),  in  which  u\,  112  and  M3  correspond  to 
stochastic  rates  of  the  commands:  move-left,  move-right  and 
stop.  In  this  example,  k\  —  —0.5  and  k2  =  0.25.  The  control 
u(t)  =  [mi(/)  u2(t)  M3 (/ ) |  is  computed  as  the  optimal  control 
based  on  the  minimum  principle  and  Hamiltonian  presented  in 
the  previous  section. 


Figure  2.  ID  EXAMPLE  [7,8] 
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The  cost  function  is: 


J=  [  w'(x)p(x,t)dx  +  e  [  u}(l)  +i£(t)  +  u\{t)di  (26) 
Jx  J  o 


where  e  10  7,  the  weighting  vv(.v)  (0  0  w^jc)]'  and  the  initial 
condition  p(x,0)  |0  0  p3(.v,0)|' are  defined  by: 


VV’3  (.v)  =  <  v'0.01 


acpi-k £££),  1.25  <*<2.25 
0,  otherwise 


(27) 


P3(v.0) 


i 


/0.02II 


exp(- 


0.02 
0,  otherwise 


),  2  <x  <  3 


(28) 


The  optimal  control  sequence  u‘(t)  [i/J(r)  u\{t)  (t ) |  in  the 

time  interval  0  <  /  <  3  is  defined  by: 
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it 
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2, 


0.21  <f  <  1.74 
0,  elsewhere 


«;(/)  =  0 


(29) 


Figure  3.  THE  FINITE  ELEMENT  SOLUTION  OF  THE  STATE  PDF 
EVOLUTION  FOR  THE  ID  EXAMPLE  UNDER  THE  OPTIMAL  CON¬ 
TROL  «*(/),  500  POINTS  [7.8] 


f  2.  1.71  <  /  <  3 
[  0.  elsewhere 


(30) 


The  evolution  of  the  stale  PDF  for  this  system  under  the  control 
u*(t)  is  presented  in  Fig.  3.  We  present  only  pi  (x,t)  and  p:,(.vj) 
because  under  this  control  P2(v,r)  0,  Vf. 

For  the  illustration,  we  generated  10  stochastic  trajectories 
of  the  continuous  variable  x  (see  Fig.  5)  under  the  control  «*(/). 
The  evolution  of  the  discrete  state  q  can  be  observed  from  the 
trend  in  x.  When  .v  decreases,  the  discrete  state  is  1 ,  and  when  it 
remains  constant,  the  state  is  q  =  3.  It  is  worth  mentioning  that, 
among  these  10  trajectories,  there  is  one  for  which  x(t)  is  con¬ 
stant.  The  small  pick  around  the  point  2.5  in  the  right  panel  of  the 
Fig.3  at  t  3,  confirms  that  the  probability  of  such  trajectories  is 
non-zero,  but  it  is  small. 

To  obtain  the  state  PDF  p(.vy),  i.e.,  its  components  p,(.v,f) 
at  a  specific  time  point  /,  we  need  to  collect  points  x(t)  and  esti¬ 
mate  components  p, ■(*./).  It  is  obvious  that  10  trajectories  cannot 
provide  a  good  estimate  of  p(.v,f)  For  this  reason,  we  generated 
105  trajectories  and  computed  the  histogram  probability  density 
function  estimate.  That  means  that  we  discretized  the  x  axis  into 
intervals  of  the  length  Av  and  counted  how  many  points  fell  into 
a  specific  region.  Finally,  we  normalized  the  histogram  so  that 
the  estimated  p(.v,  t)  is  normalized  to  1 .  Our  results  are  presented 
in  Fig.  4.  As  expected,  the  match  between  the  numerical  PDE 
system  solution  and  the  result  obtained  from  stochastic  trajecto¬ 
ries  is  exact.  There  are  only  negligible  discrepancies  due  to  data 
sampling  from  a  finite  number  of  trajectories. 


Position  x  [km] 

Figure  4.  THE  STOCHASTIC  SIMULATION  SOLUTION  OF  THE  STATE 
PDF  EVOLUTION  FOR  THE  ID  EXAMPLE  UNDER  THE  OPTIMAL 
CONTROL  (/*(/),  10s  SAMPLES 

In  Table  1 ,  we  provide  the  time  our  M  ATLAB  code  takes  to 
compute  the  state  PDF  evolution  based  on  the  stochastic  simula¬ 
tion  approach  (Fig.  4)  and  the  time  it  takes  to  compute  the  evo- 
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Figure  5.  A  RANDOM  SET  OF  1 0  TRAJECTORIES  RESULTING  FROM 
THE  STOCHASTIC  SIMULATION  UNDER  THE  OPTIMAL  CONTROL 

u*(t) 


Figure  6.  COST  FUNCTION  J  ITERATIONS  COMPUTED  WITH  THE 
ALGORITHM  [8]  WITH  THE  HAMILTONIAN  EVALUATED  BASED  ON 
THE  PDE  SOLUTIONS  (DASHED),  AND  ESTIMATED  BASED  ON  104 
(DOTTED)  AND  105  (SOLID)  STOCHASTIC  SAMPLES. 

lution  based  on  the  finite  element  (FE)  approach  with  500  points 
(Fig.  3).  We  can  see  that  the  time  for  the  stochastic  simulation 
approach  increases  roughly  linearly  with  the  number  of  samples 
and  that  the  time  of  the  FE  approach  can  be  reached  if  we  use 
approximately  3  •  10s  samples. 

For  the  purpose  of  computing  the  optimal  control,  the  num¬ 
ber  of  the  samples  we  need  depends  on  the  convergence  of  the 
optimization  algorithm  to  the  solutions.  Therefore,  as  the  final 
test  of  the  stochastic  sampling  propagator,  we  computed  the  op¬ 
timal  control  based  on  the  algorithm  presented  in  [8],  but  using 
the  stochastic  samples  for  the  Hamiltonian  evaluation  (25)  in- 

Table  1 .  TIME  COMPARISON  BETWEEN  THE  STOCHASTIC  SIMULA¬ 
TION  APPROACH  AND  THE  FINITE  ELEMENT  (FE)  SOLUTION  WITH 
500  POINTS _ 


Number  of  samples 

103 

104 

105 

FE 

Time  (s) 

0.48s 

4.8s 

44.8s 

1363s 

stead  of  the  Hamiltonian  which  is  completely  based  on  PDE  sys¬ 
tems  solutions  [8].  We  can  see  from  Fig.  6  that  for  104  samples, 
the  Hamiltonian  (25)  fluctuations  are  at  such  a  level  that  the  op¬ 
timization  algorithm  does  not  converge,  and  we  see  it  as  large 
fluctuations  of  the  cost  function.  While  the  fluctuations  are  in¬ 
trinsic  property  of  the  stochastic  samples  evaluation,  they  can  be 
smaller  if  we  use  a  larger  number  of  samples. 

When  we  use  105  samples,  the  fluctuations  are  much  smaller 
and  the  optimization  algorithm  converges.  It  is  interesting  to  no¬ 
tice  that  the  stochastic  sampling  evaluations  result  into  the  cost 
function  value  which  is  smaller  that  the  value  resulting  from  the 
PDE  system  solutions  (see  Fig.  6).  This  is  because  of  discrete 
approximations  involved  into  solutions  of  PDE  systems,  as  well 
as  approximation  of  integrals  contributing  to  the  cost  function. 

The  average  time  that  our  MATFAB  code  takes  for  a  sin¬ 
gle  iteration  based  on  the  finite  element  method  (500  points)  is 
1629s  (Fig.  6,  dashed  line).  Using  the  same  optimization  code, 
a  single  iteration  in  the  stochastic-based  method  with  105  sam¬ 
ples  is  350s  (Fig.  6,  solid  line).  Under  the  conditions  presented 
above,  the  stochastic-based  method  is  in  average  4.5jt  faster  than 
its  deterministic  counterpart. 


CONCLUSION 

In  this  paper  we  considered  a  large-size  robot  population 
control  problem  that  had  been  previously  formulated  and  solved 
in  a  probability  space  utilizing  systems  of  PDEs.  Solving  these 
PDEs  is  computationally  expensive;  therefore,  having  in  mind 
that  the  PDEs  are  in  close  connection  with  the  stochastic  process 
to  be  controlled,  we  explore  an  opportunity  to  utilize  the  stochas¬ 
tic  process  samples  to  compute  the  control. 

Our  paper  describes  an  algorithm  for  generating  the  stochas¬ 
tic  process  that  can  be  used  to  propagate  the  state  PDF  of  the 
robot  population.  We  show  that  the  algorithm  predicts  exactly 
the  state  PDF  evolution  and  we  derive  expression  for  the  Hamil¬ 
tonian  evaluation  which  involves  the  stochastic  process  samples. 
The  Hamiltonian  evaluated  in  this  way  can  be  used  in  iterations 
computing  the  optimal  control  as  if  it  was  computed  based  on 
the  PDE  system  solutions.  We  also  notice  that  the  cost  function 
resulting  from  utilizing  stochastic  processes  has  smaller  values 
than  the  cost  function  computed  based  on  PDEs.  This  means 
that  the  evaluations  involving  the  stochastic  process  samples  are 
closer  to  the  true  values. 

In  summary,  we  can  conclude  that  utilizing  stochastic  pro¬ 
cesses  for  computing  control  of  multi-robot  systems  considering 
discrete,  as  well  as  continuous  robot  states  is  possible.  By  em¬ 
bedding  stochastic  process  generators  into  analog  circuits  and 
utilizing  them  in  dedicated  processors  for  computing  control, 
complex  stochastic  optimal  control  problems  can  be  solved  ef¬ 
ficiently  and  potentially  exploited  for  real-time  multi-robot  sys¬ 
tems  control. 
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ABSTRACT 

This  paper  explores  the  characterization  of  a  multi¬ 
agent  system  through  novel  application  of  a  potential 
force  function  that  includes  tuning  coefficients. 
System  parameters  are  varied  against  a  pre-defined 
area  mapping  and  target  identification  mission  and 
evaluated  against  system  performance  metrics  to 
isolate  agent  capability  combinations  most  important 
to  pre-mission  planning  functionality.  Excursions 
were  run  in  simulation  iterations  producing  rich  data 
sets  for  evaluation  and  reduction.  Additional  future 
research  will  investigate  the  impact  of  potential 
function  tuning. 

Keywords:  multi-agent  system,  robot  cooperation, 
mission  planning,  potential  force  functions 

INTRODUCTION 

The  concept  of  organizing  groups  of  robots  into 
“multi-agent  systems”  (MAS)  has  existed  for  several 
years.  Research  in  the  field  has  centered  on 
understanding  the  root  mechanisms  and  control 
features  that  can  define  and  drive  the  behavior  and 
capability  of  the  MAS  architectures  in  terms  such  as 
task  allocation,  flocking  or  foraging.  The  study  of 
analogous  biological  systems  has  also  provided 
valuable  insight.  However,  to  date,  there  has  been 
little  work  done  with  respect  to  understanding  the 
capability  of  the  MAS  as  a  function  of  its 
composition  given  the  option  of  pre-mission  selection 
of  the  performance  and  capability  of  individual 
nodes,  as  well  as  opportunity  to  include  extra-system 
information  sources  or  sensors. 

The  ability  of  a  MAS  to  perform  complex  missions 
beyond  the  capability  of  today’s  existing  unmanned 


systems  will  be  principally  a  function  of  three 
advancements.  First  is  the  reduction  of  the  electronic 
and  mechanical  components  of  the  physical 
configuration  while  increasing  or  maintaining 
desirable  performance  characteristics.  The  second  is 
the  characterization  of  the  behavioral  control 
mechanisms  on  the  MAS,  such  that  complex  tasks 
can  be  performed  with  significant  mission 
accomplishment  in  a  dynamic  set  of  boundary 
conditions,  core  capabilities,  and  constraints.  Third, 
and  the  principal  motivation  for  this  research,  is 
increasing  the  capability  of  the  MAS  itself  as  a 
function  of  composition.  Due  to  the  inherent 
flexibility  and  variety  that  robotic  systems  typically 
offer,  it  is  merely  intuitive  that  the  strength  of  a 
MAS’  capability  will  be  in  its  diversity.  The  major 
benefit  arising  out  of  this  research  effort  is  to  gain  a 
better  understanding  of  the  impact  that  variations  in 
the  selected  composition  of  the  MAS  can  have  on  its 
resulting  capabilities  and  performance  with  a  focus 
on  the  composition-control  scheme  relationships. 

BACKGROUND 

From  the  inception  of  the  development  of  robotic 
technologies,  researchers  have  generally  concluded 
that,  with  many  if  not  most  applications,  there  are 
significant  advantages  to  be  gained  through  the 
coordination  and  orchestration  of  multiple  robotic 
entities  [1,  2].  This  is  not  difficult  at  all  to  imagine 
by  simply  observing  Darwinian  development  in 
biological  systems  whereby  two  or  four  legs  proved 
superior  to  one  for  locomotion;  or  separate  arms  each 
with  a  hand,  fingers,  and  an  opposing  thumb  proved 
superior  for  grasping  and  cooperatively  manipulating 
objects.  Further,  it  similarly  is  understandable  that  a 
small  group  of  separate  robotic  entities  (e.g.,  a  small 


number  of  cooperating  or  coordinating  unmanned 
ground  vehicles)  could  also  exhibit  significantly 
desirable  characteristics  in  certain  circumstances. 
Five  significant  motivations  for  using  multi-robot 
systems  include:  task  complexity;  task  distribution; 
resource  distribution;  parallel  processing;  and 
robustness  through  redundancy  [3].  Some 
researchers  have  explored  the  advantages  gained  for 
specific  tasks  such  as  exploration  where  now  the  task 
can  be  accomplished  by  numbers  of  agents  surveying 
the  maneuver  space  versus  a  single  robot  tasked  with 
exploring  the  entirety  by  itself  [4].  Indeed,  these 
become  valuable  against  overarching  task 
accomplishment;  however,  not  without  burden. 
Multi-robot  systems  add  communications  overhead 
and  worldoad  with  the  requirement  for  inter-robot 
cooperation  and  knowledge.  They  also  add 
complexity  to  path  planning  and  collision  avoidance 
schemes  since  now  each  robot’s  cooperating  peer 
also  exists  as  an  obstacle  to  avoid  [5].  Multi-robot 
systems,  say,  with  on  the  order  of  10  nodes,  increase 
this  complexity.  Beyond  that  number,  say  tens  or 
hundreds  of  nodes  comprising  “swarms,”  might 
illustrate  orders  of  magnitude  complexity  increase. 

Attempts  [6-8]  at  defining  MAS  composition 
explored  heterogeneous  versus  homogeneous 
systems,  highlighting  concepts  such  as  hierarchic 
social  entropy,  diversity,  and  cooperative 
localization.  Heterogeneous  MAS  have  been  used  to 
explore  unknown  environments  for  the  purpose  of 
mapping  and  exploration  [9].  Here,  each  MAS 
featured  varying  types  of  single-mode  sensor 
payloads,  including  for  example  an  infrared  camera 
or  sonar,  coordinated  in  an  occupancy  grid  Bayesian 
mapping  algorithm.  But  again,  the  composition  of 
the  architecture  was  held  constant. 

The  underlying  control  architecture  plays  an 
absolutely  critical  role  in  the  efficiencies  observed  in 
the  multi-agent  versus  single  robot  architectures  [10- 
13].  Clearly,  beyond  the  inherent  capability  of  the 
electro-mechanical  configuration  as  well  as  the 
sensor  payloads,  the  single  greatest  contributor  to 
mission  efficiencies  and  task  capabilities  will  exist  in 
the  capability  and  comprehensiveness  of  the  control 
system.  A  wide  variety  of  algorithms  and  schemes 
have  been  explored  including  feedback  laws,  leader 
follower,  task-oriented  mission  planning,  task 
reallocation  and  reordering,  and  deliberative  and 
reactive  behavior  modeling. 

AGENT  KINEMATICS 

Differential-wheeled  robots  (Figure  1)  comprise  a 
large  segment  of  the  experimental  robot  type  used  in 


research  today.  Quite  simply,  there  are  two  drive 
wheels  situated  parallel  and  opposite  to  each  other 
that  can  be  independently  controlled  in  terms  of 
applied  torque  to  dictate  the  robot’s  motion.  The 
kinematic  equations  governing  differential-wheeled 
robots  are: 


Xj  =  Vi  sin  (0j) 

(l) 

II 
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o 

C/3 

(2) 
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where, 

l?j  is  the  linear  velocity, 

is  the  rotational  velocity,  and 
Xj,  Zj,  are  the  absolute  position  and  orientation, 
respectively. 

Further,  the  position  of  the  i-th  robot  can  be 
represented  by  the  vector: 

n  =  [Xi.Zif.Ti  G  R2  (4) 

And  the  vectors  representing  N  robots  comprising  the 
multi-agent  system  (MAS)  is: 

tmas  =  [tf.rj . rfi]T, rMAS  G  R2  (5) 


FIGURE  1  -  Differential-wheeled  Robot  Kinematics 

ROBOT  CONTROL  STRATEGY 

Control  via  potential-field  theory  has  proven  to  be  a 
valuable  basis  upon  which  more  complex  control 
feedback  influence  can  be  implemented.  In 
particular,  potential-field  theory  affords  a 
comprehensive  and  simple  method  to  add  factors  that 
can  be  scaled  either  as  independent  input  functions  or 
dependently  upon  selected  system  features. 

The  basis  of  the  potential  field  is  to  represent  system 
control  influences  as  independent  or  dependent  forces 
[14].  Each  force  is  calculated  based  on  physical 
parameters  (such  as  real  distances)  or  on 
representative  values  (such  as  arbitrary  repulsion  to 
facilitate  obstacle  avoidance  or  attraction  to  a  desired 
target  location).  In  conjunction  with  the  potential 
field,  it  is  convenient  to  represent  the  operational 
space  as  an  occupancy  map,  i.e.,  a  two-  or  three- 


dimensional  grid  that  sub-divides  the  space  into 
identifiable  cells.  Each  cell  would  then  have 
characteristics  such  as  a  potential  related  to  its 
occupancy  status,  or  status  as  a  target  destination. 
Since  potential  is  a  function  of  relative  proximity,  the 
relative  location  of  the  cells  would  also  be  cataloged. 

The  goal  of  the  control  function  for  MAS  will  be  to 
calculate  the  agent-unique  potential  force  witnessed 
by  each  of  the  agents  comprising  the  MAS.  Force 
will  define  the  resultant  heading  and  speed  that  the 
individual  agent  would  follow.  The  potential  force 
existent  on  the  Mth  agent  is  given  by: 


L 


+ 

i= 1 

where, 

Qm  =  inherent  charge  of  the  Mth  agent, 

Qi  =  inherent  charge  of  the  ith  agent, 

(Ji  =  inherent  charge  of  the  ith  cell, 

rL  =  radius  vector  from  the  Mth  agent  to  the  ith  agent 

or  ith  cell, 

N  =  number  of  cells  in  the  occupancy  map, 

M  =  total  number  of  agents, 

P  =  calculated  force  contribution  from  specified 

payloads,  and 

jlx  =  tuning  coefficients. 

Then,  the  resultant  heading  angle  of  the  Mth  agent  is 
given  by: 


The  calculation  of  P  will  vary  with  payload  type  as 
the  value  and  weighting  of  each  payload  type  can  and 
should  have  varying  degrees  of  influence  on  the 
resultant  potential  force  calculation. 

Tuning  coefficients  (jix)  provide  a  mechanism 
whereby  the  relative  contribution  of  each  of  the 
potential  forces  can  be  directly  tuned  either  as  an 
independent  variable,  or  dynamically  tuned  based  on 
a  feedback  mechanism.  This  feature  is  particularly 
important  with  multi-agent  systems  due  to  scalability. 
Since  the  potential  forces  contributed  by  elements 
within  the  system  are  cumulative,  it  becomes  very 
important  to  gain  an  awareness  of  the  proportional 
contributions  and  subsequently  be  able  to  control 


them  proactively.  To  illustrate,  let  us  assume  an 
occupancy  grid  of  120X240  cells  or  a  total  of  28,800 
total  cells.  Each  cell  might  offer  a  contributing 
charge  value  from  0  to  1  meaning  a  span  of  28,800 
charge  units.  If  other  contributors  merely  summed  to 
a  maximum  of  1,000  charge  units,  the  cells’ 
contribution  would  easily  overwhelm  any  potential 
contribution  calculation. 

The  occupancy  map  cell  characteristic  vector 
represents  the  probability  that  the  cell  is  occupied.  A 
value  of  1.0  indicates  that  there  is  a  100  percent 
chance  that  the  cell  is  occupied.  A  value  of  0.0 
indicates  that  there  is  a  100  percent  chance  the  cell  is 
unoccupied.  A  value  of  0.5  would  indicate  an  equal 
probability  that  the  cell  is  either  occupied  or 
unoccupied.  Separate  variables  that  provide  valuable 
influence  in  the  force  calculation  are  the  number  of 
times  that  a  cell  has  been  sampled  by  a  sensor  and  the 
number  of  times  that  a  percentage  of  the  entire 
occupancy  map  has  been  visited.  The  usefulness  here 
is  to  accommodate  the  dynamic  nature  of  agents 
moving  on  the  stage  as  well  as  the  potential  for 
moving  targets  in  the  field  of  regard.  As  a  result,  the 
charge  associated  with  each  cell  is  given  by: 

PoRmax  for  0<V<n  (8) 

0  for  V>n 

where, 

P0  =  cell  occupancy  probability, 

Rmax  =  maximum  possible  cell  charge, 

V  =  number  of  visits  to  the  ith  cell,  and 

TL  =  number  of  times  that  a  percentage  of  the  cells 

have  been  visited. 

SIMULATION  ENVIRONMENT 

Cyberbotics’  Webots  simulation  package  provides  an 
extremely  robust  capability  and  interface  with  very 
high  resolution  characterization  of  the  robots  in  a 
physics-enabled,  three-dimension,  virtual  world. 
The  robot  agents  utilized  are  the  e-puck  robot 
originally  developed  by  the  Ecole  Polytechnique 
Federale  de  Lausanne  (EPFL)  in  Lausanne, 
Switzerland.  E-puck  is  principally  an  educational 
robot  that  offers  significant  research  capability  both 
through  its  own  autonomous  processing  and  sensing 
capability,  as  well  as  its  close  linkage  with  Webots. 
E-puck  is  approximately  7  cm  in  diameter  and  has  a 
number  of  sensors  including  eight  infra-red  sensors 
positioned  around  the  robot’s  perimeter  that  provide 
both  proximity  sensing  and  light  sensing,  a  640x480 
pixel  camera  on-board  that  can  provide  video  and 
imagery  in  either  color  or  black  and  white  modes, 
three  omni-directional  microphones  for  sound 


localization,  a  three-axis  accelerometer,  and  on-board 
speaker.  Communications  is  provided  by  Bluetooth. 
Programming  is  in  C  and  compiled  via  a  GNU  GCC 
compiler  system. 

The  representative  simulation  and  experimental  agent 
arena  for  this  research  consists  of  a  120  cm  by  240 
cm  stage  bordered  by  a  low  wall.  The  stage  is  sub¬ 
divided  into  a  representative  occupancy  mapping  grid 
consisting  of  1  cm  by  1  cm  cells  each  having  an 
associated  vector  representing  dynamic 

characteristics  such  as  the  probability  of  occupancy 
or  potential  charge.  The  green  and  white 

checkerboard  appearing  in  the  simulation’s  stage  is 
physically  10  cm  x  10  cm  and  exists  merely  to 
identify  rough  approximation  of  localization.  The 
stage  also  hosts  obstacles  of  varying  shapes  and  sizes 
as  well  as  identified  agent  starting  points.  In 
simulation,  the  localization  of  each  agent  is  provided 
by  a  surrogate  global  positioning  system  function 
afforded  through  the  Webots  simulation  software  and 
in  contrast  to  the  real  e-puck’s  lack  of  such 
capability.  In  keeping  with  the  representation  and 
standard  established  in  the  Webots  simulation 
package,  the  world  coordinate  system  is  represented 
as  illustrated  in  the  figure  below. 


In  order  to  visualize  and  track  the  progress  of  the 
simulation  run  and  its  representative  evaluation 
metrics,  a  MATLAB  figure  is  generated  every  100 
time  steps  that  offers  information  on  the  e-pucks’ 
track  history,  cell  charge,  occupancy  mapping  and 
cell  visit  count  (Figure  3). 

RESULTS 

The  payload  capabilities  at  this  stage  in  the  research 
are  restricted  to  the  proximity  sensors.  Future  work 
will  add  the  e-puck's  camera  imaging  and  video.  The 
simulated  stage  replicates  that  forecast  for  the 


experiment  and  consists  of  a  bounded  120  cm  by  240 
cm  flat  arena  with  an  unknown  number  of  obstacles 
of  unknown  size,  shape,  and  color.  The  scenario 
used  is  to  have  the  MAS  search  the  bounded  stage 
and  populate  the  occupancy  map  with  information 
collected  by  the  proximity  sensing.  The  scenario  is 
run  for  a  constant  2,000  time  steps  per  simulation 
experiment.  The  independent  variable  considered  is 
the  number  of  agents  (e-pucks),  from  one  to  four, 
comprising  the  multi-agent  system.  The  dependent 
information  collected,  reduced,  and  evaluated  are  the 
resultant  occupancy  maps,  cell  charge  mapping;  cell 
visit  count;  and  map  coverage  count  (Figure  3).  The 
data  itself  is  represented  in  three-dimensional  graphs 
with  dimensional  scaling  on  the  x  and  z  axes  (the 
horizontal  plane  in  the  graph),  and  values  vertically. 
The  range  on  the  value  axis  is  the  same  for  all  four 
charts  to  facilitate  qualitative  analysis,  i.e.,  the  charts 
are  all  proportionally  similar. 
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FIGURE  3  -  Simulation  Metrics  Output  Screen 

There  are  several  very  interesting  observations  and 
characterizations  derived  from  the  reduced  and 
analyzed  data. 

1.  The  map  coverage  count  increases  non-linearly 
with  additional  robots  in  the  multi-agent  system. 
Figure  4  shows  the  map  coverage  count  witnessed 
from  the  simulation  runs  of  the  various  sized  MASs. 
This  is  an  expected  result  given  the  time/space 
efficiencies  offered  by  MAS  in  consideration  of  inter- 


robot  repulsive  forces  included  in  the  path  planning 
algorithm. 
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FIGURE  4  -  Map  Coverage  Count 

2.  The  map  coverage  count  curve  of  a  fixed  size  MAS 
approaches  an  asymptote.  This  is  an  unexpected 
result  as  one  would  expect  the  map  coverage  count  to 
continue  increasing  as  the  robots  continue  to  search 
the  space.  One  observation  however  is  that  this  is  a 
local  asymptote  whereby  continuing  the  search  would 
show  another  exponential  growth  of  the  map 
coverage  count  to  a  second  asymptote.  Alternately, 
this  could  very  well  be  an  indication  that  the  space 
has  been  exhaustively  searched  such  that  a 
confidence  could  be  declared.  This  phenomenon  will 
be  a  focus  of  future  research. 

3.  The  potential  energy  function  offers  efficient  and 
uniform  coverage  capability  in  a  search  mode.  The 
data  represented  proves  to  be  uniform  over  the  stage 
space  and  comprehensively  covers  the  stage  itself. 
This  is  quite  expected  given  the  influence  that 
occupancy  mapping  has  and  the  related  magnification 
of  unexplored  space  therein.  The  one  exception 
noted  is  discussed  in  the  next  item. 

4.  Obstacles,  more  specifically  the  charge  mapping 
of  occupancy  cells  comprising  the  obstacle  space, 
skew  the  path  planning.  This  was  a  shortcoming 
noted  in  published  research  [14]  and  reconfirmed 
herein.  This  is  a  result  of  the  fact  that  the  robot  can 
never  sense  inside  the  perimeter  boundary  of  the 
obstacles  such  that  the  cell  charges  within  are  never 
decremented. 

5.  The  relative  proportions  of  the  elements 
comprising  the  force  calculation  in  the  path  planning 
routine  can  vary  significantly  over  time.  This  too  is  a 
subject  ripe  for  further  investigation  in  continuing 
research.  As  an  example,  the  force  associated  with  a 
mature  occupancy  mapping  (i.e.,  a  well  known 
environment)  is  possibly  only  1-5  percent  of  that 
associated  with  an  unknown  environment.  In 


addition,  the  collective  repulsive  force  from  robot  to 
robot  is  a  function  of  distance  between  the  robots,  but 
is  consistent  over  time  since  the  average  distance 
remains  somewhat  constant.  The  result  of  these  two 
facts  is  that  the  influence  of  robot  repulsion  over  time 
effectively  increases.  In  this  limited  case,  it  actually 
becomes  a  desirable  feature  as  one  would  endeavor  to 
spend  less  energy  searching  known  space.  However, 
future  research  will  closely  consider  this 
phenomenon  especially  as  it  applies  to  the  addition  of 
imaging  and  video  payloads  and  the  dynamic  nature 
of  their  information  contribution. 

6.  The  force  values  drop  precipitously  as  a  function 
of  distance.  The  physics  of  potential  energy 
functions  dictate  that  the  force  is  inversely 
proportionate  to  the  square  of  the  distance.  In  this 
research,  the  energy  physics  were  replicated  as  such. 
However,  Figure  5  shows  that  this  tends  to  greatly 
magnify  the  influence  of  proximate  cells  and  greatly 
diminish  distance  cells.  The  conclusion  drawn  here 
is  that  in  an  effort  to  normalize  cell  charges,  and  gain 
better  responsiveness  over  the  entirety  of  the  stage 
versus  proximate  cells,  research  should  explore 
varying  the  distance's  exponent.  Clearly,  this 
deviates  from  classic  potential  theory  and  defies  the 
laws  of  physics,  but  might  nonetheless  offer  some 
interesting  advantages  in  path  planning  efficiencies. 


FIGURE  5  -  Total  Potential  Force 


7.  Transition  from  global  path  planning  control  to 
local  collision  avoidance  control  exhibits  undesirable 
oscillations.  Global  path  planning  is  generated  using 
the  potential  function  while  local  collision  avoidance, 
triggered  by  obstacle  detection,  uses  classic 
Braitenberg  theory.  Simulation  runs  illustrate  that  as 
an  individual  robot  transitions  between  these  two 
control  states,  an  undesirable  oscillation  appears  at 
the  boundary  (refer  to  the  upper  edge  of  the  stage  in 
Figure  6).  This  phenomenon  results  from  two 
realities.  First,  the  cell  charges  tend  to  be  higher  at 
the  perimeter  boundaries  due  in  fact  to  this 
oscillation.  Therefore,  the  robots  strive  toward  that 
boundary.  Secondly,  the  existing  Braitenberg 
algorithm  is  tuned  to  simply  motion  reflect  off  of  the 


boundary  as  opposed  to  perimeter  follow.  This  too  is 
an  area  that  will  be  refined  in  subsequent  research  by 
exploring  modification  of  the  Braitenberg  matrix, 
tuning  factors  that  can  be  incorporated  into  the 
transition  space,  and  cell  charge  calculations  at 
obstacle  and  perimeter  boundaries. 
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FIGURE  6  -  Individual  E-puck  Trajectories 
CONCLUSIONS 

This  paper  explores  the  characterization  of  a  multi¬ 
agent  system  through  novel  application  of  a  potential 
force  function  that  includes  tuning  coefficients. 
Initial  excursions  were  run  in  simulation  that  varied 
the  number  of  robot  agents  teamed  to  accomplish  an 
area  search  mission  with  no  a  priori  target  or  obstacle 
information.  Data  reduction  and  evaluation  shows 
interesting  behaviors  for  global  path  planning  and 
local  collision  avoidance  and  set  the  stage  for  further 
investigation  of  potential  function  coefficient  tuning. 
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Abstract 

Formation  of  patterns  in  a  system  of  interacting 
units  of  heterogeneous  types  is  a  self-organized  behav¬ 
ior  which  is  seen  in  many  biological  systems.  Ear¬ 
lier  research  in  this  area  has  indicated  that  such  pat¬ 
tern  formation  behaviors  in  biological  cells  and  tis¬ 
sues  are  made  possible  because  of  difference  in  the 
adhesivity  between  different  types  of  cells  or  tissues. 
Inspired  by  this  differential  adhesivity  model,  in  our 
earlier  research,  we  had  presented  a  decentralized 
approach  based  on  differential  artificial  potential  to 
achieve  the  segregation  behavior  in  a  swarm  of  hetero¬ 
geneous  robotic  agents  in  which  agents  of  different  types 
fanned  spatially  separate  clusters.  In  this  paper,  we  ex¬ 
tend  that  work  by  presenting  an  approach  to  achieve 
aggregation  in  which  agents  of  different  types  get  uni¬ 
formly  mixed  with  each  other.  The  method  is  based  on 
the  proposition  that  agents  of  different  types  experience 
varying  magnitude  of  potential  while  interacting  with 
the  agents  of  different  types.  An  analysis  of  the  system 
with  the  proposed  approach  in  Lyapunov  sense  is  car¬ 
ried  out  for  stability.  Extensive  simulation  studies  and 
numerical  analysis  suggest  that  the  proposed  method 
would  lead  a  population  of  heterogeneous  agents  to  an 
aggregated  configuration. 

1.  Introduction 

Control  of  multiple  autonomous  vehicles  collabo- 
ratively  carrying  out  a  task  has  received  attention  of  sev¬ 
eral  researchers  working  in  the  area  of  mobile  robotics 
because  of  its  potential  to  achieve  unprecedented  per¬ 
formance  in  terms  of  efficiency,  ability  to  carry  out 
complex  tasks,  and  robustness.  Research  in  this  area 
has  been  motivated  by  applications  in  a  number  of 
fields  including  cooperative  search  and  rescue  opera¬ 


tion,  surveillance,  reconnaissance,  and  boundary  pro¬ 
tection.  One  of  the  very  first  applications  of  formation 
control  of  multiple  agents  was  behavioral  simulation  of 
flocks  of  birds,  herds  of  animals  and  schools  of  fish 
for  computer  graphics  by  Reynolds  [17].  Drawing  in¬ 
spiration  from  Reynolds’  approach,  many  researchers 
have  focused  on  designing  decentralized  controller  for 
achieving  flocking  behavior.  The  examples  include 
behavior-based  methods  [1],  leader-follower  technique 
[3, 4],  method  based  on  formation  constraint  and  virtual 
leaders/beacons  [5,  13],  and  Lyapunov  function  based 
methods  [13,  14,  15,  16].  The  concept  of  artificial  po¬ 
tential  has  been  used  in  robotics  by  many  researchers. 
For  example,  artificial  potential  has  been  used  for  path 
planning  [24],  manipulator  control  [23],  robot  naviga¬ 
tion  [9],  obstacle  avoidance  [8],  and  multi-robot  forma¬ 
tion  control  [10,  12,  13,  15]. 

Obtaining  a  desired  shape  and  pattern  of  the  for¬ 
mation  can  be  critical  for  a  mission  relying  on  coordi¬ 
nated  action  by  multiple  mobile  agents.  In  many  situa¬ 
tions,  it  may  not  be  possible  to  integrate  all  the  capabil¬ 
ities,  sensing  or  actuation,  required  for  different  kinds 
of  tasks  in  an  individual  robot.  Accordingly,  the  robots 
may  have  heterogeneous  abilities  for  sensing  and  actua¬ 
tion  that  will  enable  them  to  perform  specific  tasks.  Het¬ 
erogeneous  robots  must  be  able  to  self-organize  them¬ 
selves  in  a  mission  specific  manner  to  carry  out  tasks 
assigned  to  them.  The  main  contribution  of  this  paper 
is  in  the  synthesis  and  analysis  of  a  controller  that  can 
be  applied  to  a  swarm  consisting  of  two  types  of  robots 
so  that  the  swarm  forms  a  pattern  of  homogeneous  mix¬ 
ture  of  robots  in  which  nearest  neighbors  of  a  robot  of  a 
certain  type  are  robots  of  the  other  type.  In  our  ear¬ 
lier  papers  [11,  12],  we  presented  control  algorithms 
for  achieving  segregation  in  which  the  two  populations 
form  spatially  separate  clusters.  In  this  paper,  we  extend 
that  work  by  presenting  an  algorithm  that  will  accom¬ 
plish  the  aggregation  in  which  the  two  poulations  get 
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intermixed  homogeneously. 


served  in  biological  systems. 


2.  Segregation,  Aggregation,  and  Pattern 
Formation  in  Biology 

Complex  patterns  form  in  living  systems  as  a  result 
of  interactions  among  basic  constituent  units  and  ran¬ 
dom  processes.  Examples  include  formation  of  ocular 
dominance  stripes  in  visual  cortex  of  cat  and  monkey, 
and  formation  of  tentacle  pattern  in  hydra.  Of  particular 
relevance  to  current  work  is  the  pheomenon  of  sorting 
which  is  seen  in  several  biological  systems.  Examples 
include  sorting  of  cells  based  on  their  types  and  func¬ 
tionalities.  Cell  sorting  is  one  of  the  basic  phenomenon 
which  leads  to  formation  of  patterns  and  organs  in  liv¬ 
ing  organisms.  Most  of  the  strategies  or  models  in  liter¬ 
ature  that  can  explain  formation  of  patterns  rely  on  dif¬ 
ferential  attraction/inhibition.  For  example,  Swindale’s 
model  [19]  accounts  for  formation  of  ocular  dominance 
stripes  in  visual  cortex  based  on  local  activation  and  lat¬ 
eral  inhibition  mechanism  [6]  for  like  type  of  synapses, 
and  local  inhibation  and  lateral  activation  [2,  19],  the 
reverse,  for  the  unlike  type  of  synapses.  One  of  the 
most  popular  models  is  the  Turing  model  [22]  which 
is  reaction-diffusion  model  to  mathematically  represent 
the  transport  phenomenon  in  biological  and  natural  sys¬ 
tems.  This  model  tries  to  explain  the  interaction  of  par¬ 
ticles  with  the  environment  and  their  motion  in  space. 
In  early  1990s,  Graner  and  Glazier  [7]  proposed  a  lat¬ 
tice  based  modified  version  of  larg e-Q  Potts  model  with 
differential  adhesivity  to  explain  and  simulate  the  sort¬ 
ing  of  a  mixture  of  two  types  of  biological  cells.  In 
fact,  it  has  been  long  known  [18]  that  it  is  the  difference 
in  intercellular  adhesivity  that  leads  to  sorting  in  cer¬ 
tain  kinds  of  cellular  inetractions  while  homogeneous 
intermixing  in  other  kinds.  The  final  state  of  cell  con¬ 
figuration  is  achieved  when  the  overall  surface  energy  is 
globally  minimized.  Based  on  this  principle,  Steinberg 
[18]  postulated  that  two  types  of  cellular  units  A  and  B 
are  aggregated  when: 


WAB> 


( Waa  +  Wbb ) 
2 


(1) 


where  Waa  and  Wnn  represent  the  work  of  cohesion  be¬ 
tween  particles  or  cells  of  same  types  (i.e.,  between 
types  A  &  A,  and  B  &  B  respectively),  and  Wab  rep¬ 
resents  the  work  of  adhesion  between  cells  of  types  A 
and  B.  When  the  above  condition  is  satisfied,  the  over¬ 
all  surface  energy  is  minimzed  when  A  and  B  units  are 
alternately  arranged,  or  in  other  words,  the  two  popu¬ 
lations  are  intermixed.  The  method  for  aggregation  in 
artificial  mobile  agents  presented  in  this  paper  is  mo¬ 
tivated  by  this  differential  adhesivity  phenomenon  ob- 


3.  Problem  Formulation 

The  group  of  mobile  agents  consists  of  N  fully  ac¬ 
tuated  agents,  each  of  whose  dynamics  is  given  by  the 
double  integrator: 


<li  =  Pi 

Pi  =  Ui(t)  i=  1,2, ...  (2) 


where  q;  and  //,  are  //(-dimensional  position  and  veloc¬ 
ity  vectors  respectively  of  agent  i.  The  group  of  mobile 
agents,  considered  in  this  paper,  consists  of  two  differ¬ 
ent  types  of  agents:  type  A  and  type  B.  The  number  of 
agents  of  type  A  is  Na  and  that  of  type  B  is  Ng  such 
that  N  =  Na  +  Ng  .  The  objective  of  this  paper  is  to 
synthesize  a  controller  that  drives  the  whole  swarm  to 
asymptotically  flock  in  a  manner  that  the  swarm  forms 
a  homogeneous  mixture  of  robots  of  type  A  and  type  B, 
i.e.,  nearest  neighbors  of  a  robot  of  type  A  are  robots  of 
type  B  (and  vice-versa).  This  emergent  behavior  of  for¬ 
mation  of  homogeneous  mixture  is  referred  to  in  this  pa¬ 
per  as  aggregation.  Agents  are  said  to  flock  (asymptoti¬ 
cally)  when  all  agents  achieve  the  same  velocity  vector, 
distances  between  the  agents  are  stabilized,  and  no  col¬ 
lisions  occur.  In  order  to  define  the  aggregation  mathe¬ 
matically,  we  can  define  metrics  for  each  agent  i  which 
represent  average  distances  between  that  agent  and  its 
neighbors  (set  of  agents  Nj  within  a  range  of  Rnn)  of 
each  type.  Hence,  for  each  agent  i  of  type  X  £  {A,B}, 
we  define  metrics: 

R*Y  =  W\  Wqi~qM  (3) 

rV‘  I  ieNY 


where  Y  £  {A.B\ ,  and  Nf  is  the  set  of  agents  of  type  Y 
in  Ni .  Furthermore,  we  define: 


=  _  V  Raa 

avg  Na£a  * 
r»B  =  —  y  rbb 

aVg  Nr  ^  ' 
L^B  ieB 

r*B  = _ I _  V  Rab 

avg  NA+NBiefUB  ‘ 


(4) 

(5) 

(6) 


Now  Aggregation  is  defined  as  a  configuration  of  agents 
where  the  average  distance  between  the  agents  of  like 
types  (type  A  or  type  B)  is  more  than  the  average 
distance  of  agents  between  the  unlike  types  (between 
agents  of  type  A  and  type  B).  Alternatively, 

/A  >/*  fMB  >  fAB  (7) 

'  avg  ^  '  avg  i  '  avg  ^  '  avg  V  '  > 
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The  above  condition  is  based  upon  the  average  prox¬ 
imity  of  agents  of  different  types  in  the  defined  neigh¬ 
borhood  of  a  given  agent.  Another  metric  can  be  based 
upon  the  distance  to  the  nearest  neighbor  of  each  type 
from  a  given  agent.  The  distance  of  the  nearest  neighbor 
of  agent  i  of  type  X  £  {A .  B\  is  given  by: 


RiNN  —  m'njeN[  ll'?1’  Qj  II 

(8) 

Based  upon  the  above  equation,  the  metric  can  be  de¬ 
fined  as: 

A4  —  1  V  RAA 

'avg(NN)  ~  at  Zj  IxiNN 
iyA  ieA 

(9) 

RB  -  J_  V  f?BB 

'  avg(NN)  At  Zj  ^iNN 

iyB  ieB 

(10) 

jAB  —  1  V  RAB 

avg{NN)  NA+NBiefUB  tNN 

(11) 

_  1  V  pBA 

*vg(NN)  Na  +NB.efuB 

(12) 

And  the  condition  for  aggregation  can  be  given  by: 


avg(NN) 


■*>  yAB 

'avg(NN)> 


.JIB  BA 

avg(NN)  ^  '  avg(NN) 


(13) 


4.  Control  Law  Formulation 


where,  a  is  a  scalar  control  gain,  and  q,j  =  \\qj  —  qi\\. 
The  parameter  do  represents  the  inter-agent  distance  be¬ 
low  which  (i.e.  when  qtj  <  do)  the  interaction  force 
is  repulsive  (negative)  and  above  which  (i.e.  when 
qij  >  do)  the  interaction  force  is  attractive.  Figure  1 
shows  the  potential  function  plotted  against  the  inter¬ 
agent  distance.  As  indicated  in  the  figure,  the  potential 
becomes  minimum  when  the  inter-agent  distance  is  do. 
The  initial  condition  and  dynamics  exclude  the  situa¬ 
tions  where  qt  =  qj  where  the  control  law  (14)  is  unde¬ 
fined. 


Interaction  Potential 


Figure  1.  Interaction  Potential  versus  Inter- 
Agent  Distance 


This  section  presents  the  control  law  which  causes 
a  population  of  heterogeneous  agents  to  asymptotically 
flock  as  well  as  aggregate.  For  a  system  of  N  mobile 
agents  with  NA  agents  of  type  A  and  Ng  agents  of  type 
B,  the  following  feedback  control  law  is  considered: 

Mi  =  -  E  VqiViAhj-<lA\)-a  X  ( Pi-Pj )  d4) 

jeNi  jeNi 

where  u,  is  the  control  input  to  the  agent A,  is  the  set 
of  agents  in  the  neighborhood  of  agent  i,  Vq ( 1 1 qj  —  g;-||) 
is  the  artificial  potential  of  interaction  between  agents 
i  and  j,  || qj  —  q, -||  is  the  norm  of  vector  ( qj  —  qt)  rep¬ 
resenting  the  Euclidean  distance  between  agent  i  and 
j,  and  V9;  is  the  gradient  with  respect  to  coordinates 
of  agent  i  i.e.,  qj.  First  term  in  Equation  (14)  repre¬ 
sents  the  gradient  of  potential  function,  and  the  second 
term  represents  damping  and  causes  the  agents  to  match 
their  velocities  with  each  other.  The  artificial  potential 
is  a  non-negative  function  of  relative  distances  between 
a  pair  of  neighbors  given  by  Vq  (7/,  .qj) :  R2m  —>R>  o .  Ar¬ 
tificial  potential  function,  Vq  due  to  interaction  between 
two  agents  i  and  j  can  be  expressed  1 13]  as: 

Vij  =  a  ^ln(qij)  +  (15) 


Since  there  are  two  types  of  mobile  agents  involved 
in  the  system,  there  are  three  different  kinds  of  artificial 
potentials  involved:  a)  Potentials  arising  due  to  interac¬ 
tions  between  types  A  and  A  (V/^4),  b)  Potentials  arising 
due  to  interactions  between  types  B  and  B  (VqB),  and 
c)  Potentials  arising  due  to  interactions  between  types 
A  and  B  (Vf-B).  Our  main  result  for  cause  of  aggre¬ 
gation  is  based  on  the  concept  of  differential  potential, 
i.e.,  agents  experience  different  magnitudes  of  potential 
when  they  are  interacting  with  the  agents  of  different 
types.  This  is  achieved  by  assuming  different  values  of 
the  parameter  do  for  different  types  of  interaction  po¬ 
tentials.  Explicitly,  in  Equation  (15), 

doA  =  doB  if  {i,j  eA)or(i,j  £B) 
d§B  if  (i  £  A,  j  £  B)or(i  £  B,j  £  A) 

(16) 

The  control  law  for  aggregation  can  be  achieved  when: 

d(lA  =  d$B  >  d^B  (17) 

Figure  2  shows  the  plot  of  force  of  interaction  due  to 
similar  types  and  due  to  dissimilar  types  of  robots  ver¬ 
sus  inter-agent  distance  when  the  condition  (17)  for  seg¬ 
regation  controller  is  met.  In  this  case,  it  can  be  seen 
that  the  interaction  force  between  agents  of  same  types 
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is  lesser  than  that  between  agents  of  different  types  at 
any  given  distance.  Hence,  this  method  of  aggregation, 
based  on  differential  potential,  is  analogous  to  Stein¬ 
berg’s  [18]  explanation  of  cellular  aggregation  based  on 
differential  adhesiveness  (see  Equation  (1)). 


Interaction  Force 


Figure  2.  Interaction  Force  between  Agents 
5.  Stability  Analysis 

In  this  section,  we  carry  out  an  analysis  of  conver¬ 
gence  and  stability  properties  of  the  system  of  multiple 
agents  obeying  dynamics  given  by  Equation  (2)  under 
control  law  given  by  Equation  (14).  In  order  to  carry 
out  the  stability  analysis  of  the  collective  motion  of  the 
agents,  the  following  positive  definite  function  can  be 
chosen  as  the  Lyapunov  function: 

0(q,p)=y(q)  +  ip7'p  (18) 

where  q  €  RmN  is  stacked  position  vector  of  all  agents, 
p  £  R'nN  is  stacked  velocity  vector  of  all  agents,  and 
V  (q)  :  R'"n  — ►  R>0  is  the  total  potential  energy  of  the 
system  which  consists  of  three  parts  resulting  from  in¬ 
teractions  between  agents  of:  1)  type  A,  2)  types  A  and 
B,  and  3)  type  B.  This  can  be  written  as: 

v(q)  =  VAA(q)  +  VBB(q)  +VAB(q) 

=  ^X  X  ^(lki-^ll)+XX%(lkf-^ii) 

z  ieA  jeAjj^i  ieA  jeB 

+  ^X  X  Vij(\\qj-qi\\)  (19) 

Z  ieBjeBJjti 

The  collective  dynamics  of  the  system  is  given  by: 

q  =  P  (20) 

p  =  -W(q)-L(q)p  (21) 


where  L(q)  G  j^nNxmN  js  ^-dimensional  graph  Lapla- 
cian  (see  reference  [15]).  Graph  Laplacian  represents 
the  interconnections  present  in  a  graph  in  a  matrix 
form,  and  can  be  obtained  from  other  graph  theoretic 
quantities  such  as  degree  matrix  and  adjacency  matrix. 
Among  other  important  properties  of  graph  Laplacian 
matrix  L(q),  it  is  a  positive  semi-definite  matrix. 

Lemma  5.1.  Consider  a  system  of  N  mobile  agents. 
Each  of  the  agents  follows  the  dynamics  given  by 
Equation  (2),  and  with  feedback  control  law  given 
by  Equation  (14).  For  any  initial  condition  be¬ 
longing  to  the  level  set  of  0(q,p)  given  by  Oc  = 
{ (q,  p)  :  0  (q,  p)  <  C)}  with  C  >  0,  and  when  the  under¬ 
lying  graph  of  the  system  is  fully  connected  all  the  time, 
then  the  system  asymptotically  converges  to  the  largest 
invariant  set  in  £2/  C  Oc-  The  points  in  the  largest  in¬ 
variant  set  in  0[  have  a  velocity  that  is  bounded ,  the 
velocities  of  all  agents  match,  and  the  total  potential  of 
all  agents  given  by  Equation  (19)  approaches  a  local 
minimum. 

Proof.  Differentiating  0(q.p)  with  respect  to  time  and 
using  Equation  (21)  one  gets: 

<Kq,p)  =  prvy(q) +prp 

=  pr  W  (q)  +  pr(— VL  (q)  —  L(q)p) 

=  -pr£(q)p 

-XXllPf-^l|2<°  (22) 

i  i 

From  Lasalle’s  Invariance  Principle,  all  solutions  of  the 
system  starting  in  0(  will  converge  to  the  largest  invari¬ 
ant  set  in  Oi  =  {(q,p)  G  Oc  :  0(q,p)  =  0)},  and  this 
happens  when  the  velocities  of  all  agents  match.  For 
a  detailed  proof  of  this,  please  see  references  [13,  15]. 
Furthermore,  in  the  steady  state,  the  velocities  of  the 
agents  do  not  change,  i.e.,  p  =  0.  For  a  proof  of  this 
result,  see  references  [20,  21],  From  Equation  (21),  it 
follows  that  at  the  steady  state: 

W(q)=0  (23) 

Hence,  the  total  potential  of  all  agents  is  locally  mini¬ 
mized.  □ 

6.  Simulation  Results  and  Discussions 

Extensive  simulations  were  carried  out  to  verify 
that  the  proposed  controller  would  lead  to  aggregation. 
In  the  simulations,  the  following  parameters  were  as¬ 
sumed:  dfr4  =  dBB  =  6,  and  dgB  =  3. 

Figure  3  shows  the  configuration  of  a  population 
of  20  agents  (10  each  of  type  A  and  B)  in  a  2D  space 
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at  different  times  during  the  simulation.  The  agents 
started  off  at  a  random  configuration,  and  control  law 
given  by  Equation  (14)  based  on  differential  potential 
was  applied  to  the  agents.  The  final  configuration  at 
time  T=750  sec  shows  that  the  agents  of  types  A  and  B 
form  a  homogeneous  mixture. 


X  Axis  (m)  X  Axis  (m) 


Figure  3.  Configurations  of  Agents  at  Times 
T=0  (top  left),  T=250  sec  (top  right),  T=500  (bot¬ 
tom  left),  and  T=750  sec  (bottom  right) 

Figure  4  shows  the  plot  of  average  distances  be¬ 
tween  agents  of  types  A  and  A  (r~ ),  B  and  B  (r^j, 
and  A  and  B  (r£fg)  versus  time  for  the  above  simula¬ 
tion.  At  the  final  configuration,  the  average  distances 
r^g,  r^g,  and  ®  were  found  out  to  be  4.87,  4.87,  and 
4.43  respectively,  which  clearly  shows  that  the  popula¬ 
tion  was  aggregated  based  on  the  condition  given  by  (7). 
Also,  on  average:  distance  of  the  nearest  type  A  neigh¬ 
bor  of  type  A  robot  was  3.12;  distance  of  the 

nearest  type  B  neighbor  of  type  A  robot  (j^giNN\)  was 
1.98;  distance  of  the  nearest  type  B  neighbor  of  type 
B  robot  (r^^jyjy))  was  3.12;  and  distance  of  the  near¬ 
est  type  A  neighbor  of  type  B  robot  )  was  1.98. 

Hence,  aggregation  happens  according  to  condition  (13) 
as  well. 

The  results  given  above  were  for  just  one  simula¬ 
tion  run.  In  order  to  verify  that  the  method  presented  in 
this  paper  leads  to  aggregation  in  general  in  a  population 
of  heterogeneous  agents,  an  extensive  simulation  study 
was  carried  out  in  which  100  runs  were  performed.  Fig¬ 
ure  5  shows  the  average  distance  between  the  agents  at 
steady-state.  In  each  of  the  simulation  runs,  the  pop¬ 
ulation  of  agents  consisting  of  types  A  and  B  was  ini¬ 
tialized  in  a  random  configuration  obtained  via  uniform 
distribution  of  agents  in  2D  space,  and  number  of  agents 
of  type  A  and  B  were  each  chosen  randomly  between 
values  5  and  15.  Each  of  the  runs  was  carried  out  for 


Average  Distances 


Figure  4.  Average  Distances  Between  Agents 
of  Types  A  and  A  (r^),  B  and  B  (rf"),  and  A 
and  B  (r*») 

750  seconds  of  simulation  time.  The  average  distances 
between  agents  shown  in  the  figure  are  calculated  at  the 
steady  (final)  state.  It  can  be  easily  seen  that  the  aver¬ 
age  distance  between  agents  of  type  A  (r£fg )  and  aver¬ 
age  distance  between  agents  of  type  B  (rf^)  is  larger 
than  the  average  distance  between  agents  of  type  A  and 
B  (r^yg)  for  each  of  the  simulation  runs.  Also,  the  plot 
of  average  distances  to  respective  nearest  neighbors  for 
each  types  of  agents  for  all  simulation  runs  are  shown  in 
Figure  6.  This  figure  also  demonstrates  that  aggregation 
happens  for  each  simulation  run  according  to  condition 

(13). 


Average  Distances 


Figure  5.  Average  Distance  between  Agents 


7.  Conclusions 

The  paper  presents  a  decentralized  method  to 
achieve  self  organized  bahavior  of  aggregation  in  a  pop¬ 
ulation  of  heterogeneous  agents.  In  this  behavior,  the 
heterogeneous  agents  form  a  homogeneous  mixture. 


Ill 


Average  Nearest  Neighbor  Distances 


Figure  6.  Average  Nearest  Neighbor  Distance 
between  Agents 


The  method  is  based  on  the  concept  of  differential  ar¬ 
tificial  potential.  The  paper  presents  the  stability  anal¬ 
ysis  of  a  population  of  agents  in  Lyapunov  framework, 
and  lays  down  an  analytical  foundation  for  synthesis  of 
controllers  for  aggregation  in  artificial  potential  func¬ 
tion  framework.  Extensive  simulation  studies  verify  the 
results  obtained  in  this  paper,  and  show  the  effective¬ 
ness  of  the  proposed  method  in  achieving  the  aggrega¬ 
tion  behavior  in  a  swarm  of  heterogeneous  agents. 
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A. 2. 13  On  the  Stability  of  Swarm  Consensus  under  Noise 
Control 

The  following  paper  (on  the  next  8  pages)  was  co-authored  by  Gregory  K.  Fricke, 
Bruce  Rogers,  and  Devendra  P.  Garg,  and  appeared  in  the  Proceedings  of  the 
4th  ASME  Dynamic  Systems  and  Control  Conference,  Volume  1,  on  pages  291- 
298,  published  and  presented  in  October  2011. 
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ABSTRACT 

Representation  of  a  swarm  of  independent  robotic  agents  un¬ 
der  graph-theoretic  constructs  allows  for  more  formal  analysis  of 
convergence  properties.  We  consider  the  local  and  global  conver¬ 
gence  behavior  of  an  /V- member  swarm  of  agents  in  a  modified 
consensus  problem  wherein  the  connectivity  of  agents  is  gov¬ 
erned  by  probabilistic  functions.  The  addition  of  a  random  walk 
control  ensures  e-stability  of  the  swarm  consensus.  Simulation 
results  are  given  to  show  the  rate  of  convergence  to  consensus 
and  planned  experiments  are  described. 

1  Introduction 

Control  of  a  swarm  of  robots  may  be  achieved  in  many  dif¬ 
ferent  ways.  The  seminal  work  of  Reynolds  [1]  gave  the  first 
algorithmically  efficient  representation  of  a  flock  with  very  sim¬ 
ple  rules.  Reynolds’  work  was  ground-breaking,  allowing  sim¬ 
ulation  of  the  group  dynamics  involving  very  large  numbers  of 
members.  Interestingly,  the  fundamental  work  of  Braitenberg  [2] 
in  the  area  of  behavioral  robotics  predates  Reynolds’  research, 
though  that  work  was  focused  mostly  on  individual  robots  or 
robot  pairs.  This  idea  was  further  developed  by  additional  re¬ 
search  in  swarm  robotics  by  Mataric  [3]  and  Parker  [4].  Early 
work  was  quite  ad-hoc,  but  behavioral  control  continues  to  find 
an  increasingly  rigorous  mathematical  framework.  For  variations 
on  this  topic,  see,  e.g.,  [5-14]. 

Swarming  laws  may  be  applied  to  a  variety  of  cooperative- 
robotics  settings.  Each  of  these  settings,  though,  has  a  primary 
goal  of  achieving  agreement  or  consensus.  The  specific  mean¬ 
ing  of  agreement  in  each  scenario  is  different,  but  in  general 
the  implication  is  that  the  states  of  each  robot  evolve  in  a  co¬ 
ordinated  way  to  achieve  some  goal.  A  few  specific  types  of 
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agreement  are  creation  and  maintenance  of  formation  [7, 13-15], 
robot  rendezvous  [16],  plume  localization  [17],  and  robot  segre¬ 
gation  [18].  In  this  paper,  we  are  concerned  with  rendezvous  and 
formation  control. 

Formation  control  may  be  accomplished  under  several  con¬ 
trol  regimes.  Clearly,  a  centralized  controller  could  drive  robots 
to  a  desired  configuration  under  straightforward  trajectory  con¬ 
trol.  It  is  much  more  difficult  and  interesting  to  develop  such 
capability  under  a  distributed  control  paradigm,  allowing  greater 
(if  not  unlimited)  scalability  of  the  swarm.  Distributed  control  of 
formations  falls  into  the  control  regimes  of  potential  functions, 
geometric  control,  and  graph-based  control.  Several  additional 
examples  of  distributed  formation  can  be  found  in  the  works  of 
Yun  [19],  Balch  [7],  Yamaguchi  [20],  and  Antonelli  [21], 

In  Section  2,  we  review  the  basic  notation  of  graph-based 
control  and  offer  a  control  framework  based  on  potential  field 
functions,  which  are  used  to  balance  the  relative  importance  of 
control  goals  such  as  obstacle  avoidance,  goal-seeking,  and  robot 
following  (or  avoidance)  [12,22-27].  Section  3  provides  a  proof 
of  asymptotic  convergence  using  the  potential  field  framework, 
and  Section  4  uses  simulations  to  explore  the  rates  of  conver¬ 
gence.  We  present  some  planned  experiments  in  Section  5  and 
our  conclusions  in  Section  6. 


2  Graphical  Representation 

A  brief  discussion  of  graphs  is  in  order.  Graph  notation 
varies  in  the  literature;  the  following  notation  will  be  used  in 
this  paper.  An  undirected  graph  Q  =  Q(V,E)  of  order  A  consists 
of  a  set  of  A  nodes  or  vertices  Vg  =  {1,2,  ...,A}  and  a  set  of  con¬ 
nections  or  edges  Eg  C  {Vg  x  Vg } .  If  a  pair  of  nodes  is  in  the 
set  of  edges,  (i.j)  £  Eg,  the  nodes  are  adjacent  and  are  called 
neighbors,  indicated  by  i  ~  j  or  ij.  The  set  of  neighbors  of  a 
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node  i  is  given  by  3V*  =  {j  €  Vg  \  j  ~  ;}.  Note  that  in  this  case, 
loop  edges,  e.g.,  (i,i)  €  are  allowed.  When  it  is  clear  from 
the  context,  the  subscript  will  be  dropped,  e.g.,  V  =  Vg,  E  =  Eg, 
and  g(V,E)  =  Q{Vg,Eg). 

The  degree  of  a  node  is  the  cardinality  of  9^,  denoted  by 
A,-  =  I-  The  vector  of  degrees  of  the  nodes  of  Q  is  denoted 
AQ  =  [A] ,  ....A,. .... A.vj7'.  If  every  node  is  connected  to  every 
other  node  except  itself  (so  that  A,-  =  N  —  1  for  all  vertices  i)  then 
graph  (]  is  called  complete.  A  complete  graph  of  order  A  is  also 
denoted  by  Ay. 

The  adjacency  matrix  representing  the  neighborhood  rela¬ 
tions  among  agents  is  defined  in  Eq.  (1). 


0  otherwise 

The  adjacency  matrix  leads  to  an  important  data  structure  for 
studying  aggregation  and  agreement,  the  graph  Laplacian,  de¬ 
fined  as: 

A(£)=diag(A£)-.3.  (2) 

Note  that  the  degree  vector,  A  Q,  is  equivalent  to  the  vector  of 
row-sums  of  J4;  in  this  way,  the  definition  is  easily  extended 
to  weighted  networks.  Also  notice  that  L(Q)  is  non-negative 
definite.  The  graph  Laplacian  is  intimately  related  to  diffusions 
on  the  network,  and  the  eigenvalues  and  eigenvectors  of  L(Q) 
encode  geometric  information  about  the  graph  (] .  For  exam¬ 
ple,  the  number  of  zero  eigenvalues  of  L(Q)  corresponds  to  the 
number  of  connected  components  of  (] .  Additional  information 
found  within  the  spectra  of  E(Q),  XL,  L(QW),  and  XL„,  are  dis¬ 
cussed  in  the  1988  monograph  of  Cvetkovic  el  al  [28],  as  well  as 
in  [29-35], 

Graph- theoretic  control  of  formations  is  not  novel;  the  re¬ 
ported  work  of  Tanner,  Jadbabaie,  and  Kumar  [13,14]  expounded 
on  this  topic.  More  recently,  an  extensive  set  of  research  has 
been  conducted  regarding  the  controllability  of  such  swarms  un¬ 
der  graph-theoretic  constructs.  Mesbahi  published  several  pa¬ 
pers  [36-38]  on  controllability  within  a  swarm  for  agreement, 
along  with  several  related  works  by  Hatano,  Das,  Rahmani, 
Chen,  Kim,  and  Tan  relating  to  the  study  and  manipulation  of  the 
graph  Laplacian  and  its  spectra  for  swarm  control  [15,39^-2], 
Of  particular  focus  in  the  proposed  research  is  the  extension  of 
such  graph-theoretic  controllability  concepts  to  random  graphs 
along  the  lines  of  Hatano  and  Mesbahi  [43,44],  The  extension 
to  the  dual  of  controllability,  observability,  has  additionally  been 
explored  by  Mesbahi  and  Zelazo  [45,46]. 

The  key  element  in  the  works  of  Mesbahi  is  the  notion  of 
the  state-dependent  graph.  In  general,  suppose  each  of  the  N 
agents  sits  in  an  «-dimensional  state  space,  so  that  agent  i’s  state 
is  X-,  A  W'.i  =  1 ....  .A'.  For  example,  the  state  vector  x,-  may  have 
coordinates  determining  agent  i’s  position  and  momentum.  We 


denote  the  collective  state  of  the  system  x  =  [xi  ,X2,  • . .  ,x,y]7’,  and 
X  C  RiV”  is  the  set  of  possible  states.  The  state  dependent  graph 
mapping  X  — >  g(X )  sends  each  configuration  of  system  states  to 
a  graph  g(X).  The  agents  form  the  vertices  of  g(X)  and  (i,j)  is 
an  edge  if  and  only  if  ||x,  —  xy 1 1 2  <  p,  for  some  p  >  0  and  i  A  j. 
The  neighborhood  of  agent  i  in  g(X),  written  ‘j\[1(X),  is  state  de¬ 
pendent.  The  2-norm  has  an  obvious  physical  meaning  that  is  di¬ 
rectly  applicable  to  real  connectivity  problems  such  as  wireless 
networking  (where  p  indicates  the  communication  range),  or  a 
laser  rangefinder  (or  rangefinder  pair)  with  full  2n  angular  cov¬ 
erage  or  an  array  of  vision  sensors  (where  p  is  the  detection  range 
for  these  sensors).  Alternatively,  any  p— norm  may  be  used  in  a 
similar  fashion,  though  with  somewhat  lower  physical  meaning. 

The  goal  is  to  develop  a  distributed  control  which  forces  the 
agents  to  aggregate.  In  Eq.  (3)  and  Eq.  (4)  below,  a  swarming 
law  is  given  using  the  framework  of  state-dependent  graphs.  In 
general,  let  x, -(f)  be  the  «-dimensional  state  of  agent  i  at  time-step 
t,  and  define  the  updating  of  x,-  (t )  by  n— dimensional,  first-order 
dynamics: 

Xi(t+  1)  =Xi(t)  +Uj(t)  (3) 


Ui(t)=AW(t)-Bf(^{Xt),Xt)  (4) 

Here,  M,  (f)  is  the  self-determined  control  of  agent  i  at  time-step 
t,  and  /  is  some  linear  function  of  the  states  of  the  neighbors  of  i 
at  time  t  that  determines  the  aggregation  interaction.  In  the  next 
section,  /  can  be  thought  of  as  the  average  of  the  positions  of 
the  neighbors  of  i.  The  objective  function  xV{t)  is  independent  of 
the  swarm  interaction,  and  the  real  constants  A,B  >  0  are  relative 
scaling  gains. 

There  are  several  examples  in  the  literature  of  necessary  and 
sufficient  graph  theoretic  conditions  which  force  agents  to  con¬ 
sensus  [47—49].  We  offer  a  simple  distributed  control  (not  a  gen¬ 
eral  theoretical  framework)  that  forces  agents  to  consensus.  If 
the  communication  network  depends  on  position,  A  =  0,  and  the 
aggregation  function  is  a  local  contraction,  the  agents  will  tend  to 
clump  together  in  separate  clusters.  In  the  next  section  we  show 
that  a  random  function  vP((f)  can  force  the  agents  to  eventual 
consensus  regardless  of  initial  conditions. 

3  Proof  of  Consensus  Stability 

Suppose  there  are  N  agents  distributed  in  a  compact  and  con¬ 
vex  region  D  cM.d  (generally,  d  =  2  or  3).  If  the  communication 
network  is  fixed  and  connected,  the  agents  can  steer  to  the  aver¬ 
age  position  of  their  neighbors  and  eventually  achieve  consensus. 
However,  if  the  communication  network  depends  on  the  position 
of  the  agent  and  some  fixed  sensing  radius  (much  smaller  than 
the  environment),  agents  will  not,  in  general,  converge  to  con¬ 
sensus  by  averaging  their  position  with  their  neighbors.  In  this 
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section  we  show  that  adding  a  small  random  perturbation  after 
the  averaging  step  can  force  the  agents  to  converge,  under  the 
right  conditions. 

The  key  conditions  are  (i)  a  reflecting  boundary  in  the  envi¬ 
ronment  and  (ii)  the  noise  to  sensing  radius  ratio.  There  are  two 
parts  to  the  argument.  First,  if  all  the  agents  can  sense  each  other 
and  the  size  of  the  random  jump  is  small  enough,  they  will  always 
be  near  each  other.  Second,  since  there  are  reflecting  boundary 
conditions,  the  agents  must  all  be  near  each  other  at  some  time. 
This  result  is  not  in  itself  groundbreaking,  but  it  is  interesting  that 
we  can  force  the  agents  to  converge  by  adding  noise.  Also,  the 
averaging-plus-noise  scheme  is  simple  to  implement  in  software 
and  actual  robot  systems.  In  the  rest  of  this  section,  we  make 
these  arguments  more  formal. 

For  i=  1, . .  .N,  Xi(t)  £  D  and  the  initial  conditions  x,(0)  are 
arbitrary.  Let  the  neighborhood  of  agent  i  and  time  t  be  A[ft)  = 
{j  £V  :  ||jtj(f)  —Xj(t) ||  <  p}  for  some  sensing  radius  p  >  0.  Here 
9fi(t)  since  i£0fjft),  and  typically,  R  <C  diam(D). 

The  dynamics  of  the  particles  are  given  by 


x,(f  +  1)  =  T,,-(f)  +  |iVi(f)|  1  Yj  */(0-  (5) 


a  random  perturbation  whose  measure  is  supported  on  an 
open  set  U,(t)  containing  the  origin.  Importantly,  we  must  re¬ 
strict  Uj(t)  so  that  the  jump  *P,(r)  cannot  take  the  agent  out  of  the 
domain  D.  The  restriction  can  be  considered  as  reflecting  bound¬ 
ary  conditions.  With  a  little  manipulation  these  dynamics  fit  the 
framework  of  Eq.  (3)  and  Eq.  (4).  Without  the  noise  term,  the 
agents  cannot  in  general  achieve  agreement  if  the  initial  graph  is 
disconnected.  With  certain  conditions  of  the  random  variables 
T,;(f),  once  all  the  particles  are  close  to  one  another,  they  stay 
close.  However,  their  position  will  not  be  fixed,  which  is  some¬ 
times  called  e-consensus. 

The  first  proposition  shows  that  if  the  size  of  the  jump  is 
bounded,  then  when  all  the  agents  are  near  each  other,  they  stay 
close. 


Proposition  1.  Taking  the  dynamics  defined  by  Eq.  (5),  sup¬ 
pose  the  '!/  are  i.i.d.  random  variables  drawn  from  a  distribution 
that  is  absolutely  continuous  with  respect  to  Lebesgue  measure 
and  supported  on  some  open  set  Uj(t )  containing  the  origin.  If 
diam(Uj(t))  <  p/2,  then  for  all  i  and  j 


=  %(t)  =  V  ^  max \\Xi(t  + 1)  -*/(*  + 1)||  <  p. 
ij 


Proof. 

max  ||x(f+l)  —Xj(t  +  1)|| 
hi 

™x  ii  jf  L**(0  +xvi  -  L  ■ -  ^ii 

=  max||vl,5  —  VF,'|| 

i,j 

<max(||^||  + ||^.||)  <p 
hj 

Drawing  a  random  jump  from  an  arbitrary  open  set  t/,(f) 
may  seem  overly  abstract,  but  since  we  have  not  yet  implemented 
the  algorithm  in  experiment,  we  are  striving  for  generality.  In  the 
next  section,  we  use  uniform  random  variables  on  a  disc.  How¬ 
ever,  the  proof  allows  for  any  bounded  distribution,  for  example, 
beta  or  exponential,  levy,  or  normal  (with  cut-offs).  Also,  the 
assumption  that  the  distribution  is  absolutely  continuous  with  re¬ 
spect  to  Lebesgue  measure  on  the  sets  Uft)  is  essential  in  the 
proof  of  Lemma  1  below.  We  now  wish  to  show  that  eventually 
the  particles  must  be  close  to  one  another. 

Lemma  1 .  Taking  the  dynamics  defined  by  Eq.  (5),  suppose  the 
are  i.i.d.  random  variables  drawn  from  a  distribution  that 
is  absolutely  continuous  with  respect  to  Lebesgue  measure  and 
supported  on  some  open  set  Uj(t )  containing  the  origin,  and  let 
E  C  D  be  open.  Also  suppose  that  the  diameters  of  all  the  sets 
Ufit )  are  bounded  away  from  zero.  Then  there  is  a  positive  con¬ 
stant  p  and  a  deterministic  time  T  such  that  for  all  i  =  1, ...  ,1V 
and  t  >  0 

P(xj(t  +  T)  £E)>  p. 

Proof.  Suppose  not.  Then  there  is  an  open  Eq  C  D,  some  agent 
a  €  V,  and  some  strictly  increasing  sequence  of  times  {t,},>i  C 
N  such  that  P(x0(t)  £  Eq)  =  0  for  i  >  1.  Take  r  >  0  such  that 
U  contains  the  ball  of  radius  r  centered  at  the  origin.  For  k  = 
1,2, ... ,  define  the  sets  Ek+i  =  {x  £  D  :  d(x,Ef)  <  r},  and  let 

ca(t)=mt)\-1  y  x&) 

be  the  center  of  agent  a’s  communication  group  at  each  time 
t.  For  k  >  0,  if  ca{t)  £  E^,  then  P(xa{t  +  1)  £  E^- 1)  >  0  since 
(ca  +  U)  n  If  |  is  non-empty  and  open  and  the  VIJJ,  are  drawn 
from  a  measure  absolutely  continuous  with  Lebesgue  measure. 
So  we  must  have  ca(x,)  f.  E\  for  all  i  >  1;  i.e.,  P(ca(fi)  £E i)  =  0 
for  i  >  1.  Now,  each  E^  is  open,  E^  C  £i+i  and  D  is  compact. 
Thus,  there  is  a  finite  positive  integer  F  such  that  D  C  E / .  By 
finite  recursion,  P(ca(fi)  £  E r )  =  0  for  all  i  >  1,  a  contradiction. 

The  above  proof  just  shows  that  if  an  agent  wanders  ran¬ 
domly  in  D,  it  eventually  returns  to  the  same  spot.  Theorem  1 
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puts  the  two  pieces  of  the  argument  together  by  saying  that  all 
the  agents  must  eventually  be  near  the  same  spot  at  the  same 
time.  Then  Proposition  1  gives  us  that  they  will  continue  to  be 
close  to  each  other. 

Theorem  1.  If  diam{U)  <  p/2,  there  is  a  random  time  x  such 
that 


P(max  \\xi(s) -Xj(s)\\  <  p)  =  1 
ij 


for  all  s  >  X. 

Proof  Take  any  ball  of  radius  R  Br  C  D.  By  the  lemma,  there 
is  a  positive  constant  ,u  and  a  deterministic  time  T  such  that 
P(xi(T)  G  Br)  >  fj  for  all  x,(0).  Then  set 


1  t{k) 


1  Xi(kT)  G  Br 
0  else. 


The  1  fk)  are  Bernoulli  random  variables  indexed  by  i  =  1, . . .  ,N 
and  k  =  1,2,....  Almost  surely,  there  is  ko  >  1  such  that 

flm)  =  i. 

!=  1 

Thus  set  x  =  koT  and  apply  Proposition  1. 


3.1  Discussion 

The  theorem  takes  advantage  of  the  random  walk  induced  by 
the  noise.  In  the  product  space  X  =  IT// ,  /I,  there  is  an  invariant 
set  around  the  diagonal,  and  the  noisy  dynamics  will  eventually 
force  the  agents  inside  the  attracting  set.  Significantly,  noisy  per¬ 
turbations  considered  are  quite  general,  so  it’s  possible  to  apply 
the  theorem  if  the  noise  isn’t  symmetric  about  the  origin.  That 
is,  we  can  consider  shaped  noise  that  is  realistic  for  the  sensing 
capabilities  of  robots. 

The  convexity  of  the  space  D  is  never  used  in  the  proof.  It 
is  needed  so  that  the  averages  in  the  agreement  algorithm  are 
well-defined.  If  we’re  careful  about  defining  line  of  sight  and 
which  agents  participate  in  the  averaging,  the  assumption  can  be 
relaxed  to  a  space  that  is  compact  with  open  interior.  This  allows 
for  spaces  with  obstacles.  If  the  space  is  not  compact,  the  agents 
may  wander  off  in  different  directions.  This  can  be  addressed  by 
giving  the  agents  a  notion  of  “center”  of  the  space  so  the  random 
walks  are  recurrent. 


4  Simulation 

The  previous  section  guarantees  convergence  to  consensus, 
but  the  theorem  makes  no  mention  about  the  size  of  the  domain  D 
or  the  number  of  agents.  Thus,  we  have  no  information  about  the 
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Figure  1 .  The  number  of  agents  in  the  maximal  connected  component 
for  simulations  in  the  smallest  environment  (W  =  20).  The  medians  of 
100  iterations  are  plotted.  Each  panel  has  a  fixed  number  of  agents  and 
varies  the  jump  size.  Crosses  are  R  =  0.5p;  circles  are  R  =  0.6p; 
squares  are  R  =  0.8p;  and  x’s  mark  R  =  p.  Upper  left:  N  =  10.  Upper 
right:  N  =  20.  Lower  left:  N  =  40.  Lower  right:  N  =  100. 


rate  of  convergence.  In  this  section,  we  simulate  agents  using  the 
dynamics  in  Eq.  (5)  to  explore  the  rate  at  which  the  noise  drives 
the  agents  to  consensus.  We  take  the  support  of  the  noise  process 
Uj(t)  to  be  a  disc  of  radius  R.  Without  loss  of  generality,  we  set 
the  sensing  radius  p  =  1,  and  the  domain  D  is  a  square  region  in 
R2 


4.1  Uniform  initial  conditions 

The  positions  of  the  agents  take  values  uniformly  at  random 
in  a  square  region,  and  three  widths  of  the  square,  W  =  20,60, 
and  100  are  considered.  In  each  of  the  regions,  we  also  test  four 
sizes  of  agent  population:  N  =  10,20,40  and  100.  Since  the 
dynamics  are  random,  each  parameter  setting  is  run  100  times 
from  initial  positions  chosen  uniformly  at  random  from  D ,  and 
the  dynamics  are  run  in  each  iteration  for  10,000  time  steps. 

The  jumps  considered  here  are  determined  first  by  generat¬ 
ing  uniform  random  variables  r  G  [0,/?]  and  0  G  [0,  2tc]  .  We  set 
the  x-coordinate  of  the  jump  to  be  rcos(0)  and  the  y-coordinate 
to  be  r  sin(0).  Note  that  this  is  not  a  uniform  sampling  of  the  disc 
of  radius  R;  it  is  biased  toward  the  center. 

Also,  in  Section  3,  a  triangle  inequality  is  used  to  show 
that  if  the  diameter  R  <  p/2,  the  noise  cannot  drive  agents  out 
of  their  mutual  sensing  radius  once  they  are  all  close  to  each 
other.  However,  the  triangle  inequality  is  a  notoriously  dull 
bound,  so  we  also  simulate  four  different  sizes  of  random  jumps 
R  =  0.5p,0.6p,0.8p,p  for  each  of  the  values  of  N  and  W.  That 
is,  the  smallest  jump  considered  is  the  one  proven  to  give  asymp¬ 
totic  convergence. 
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Figure  2.  The  number  of  agents  in  the  maximal  connected  component 
for  simulations  in  the  middle  sized  environment  (W  =  60).  The  medians 
of  100  iterations  are  plotted.  Each  panel  has  a  fixed  number  of  agents 
and  varies  the  jump  size.  Crosses  are  R  =  0.5p;  circles  are  R  =  0.6p; 
squares  are  R  =  0.8p;  and  x’s  mark  R  =  p.  Upper  left:  N  =  10.  Upper 
right:  N  =  20.  Lower  left:  N  =  40.  Lower  right:  N  =  100. 


Figure  3.  The  number  of  agents  in  the  maximal  connected  component 
for  simulations  in  the  largest  environment  (IT  =  100).  The  medians  of 
100  iterations  are  plotted.  Each  panel  has  a  fixed  number  of  agents  and 
varies  the  jump  size.  Crosses  are  R  =  0.5p;  circles  are  R  =  0.6p; 
squares  are  R  =  0.8p;  and  x’s  mark  R  =  p.  Upper  left:  N  =  10.  Upper 
right:  N  =  20.  Lower  left:  N  =  40.  Lower  right:  N  =  100. 


A  couple  of  general  patterns  emerge  from  the  simulations. 
First,  the  agents  that  are  initially  close  to  one  another  quickly 
aggregate  into  clusters.  Individual  agents  jump  around  the  do¬ 
main  in  steps  bounded  by  R ,  but  the  more  agents  in  a  cluster, 
the  more  the  the  random  jumps  are  averaged  out.  So  large  clus¬ 
ters  move  around  the  domain  more  slowly.  Thus,  the  dynamics 
act  very  much  like  a  coalescent  process  in  probability.  Small 
clusters  form  quickly,  but  as  the  clusters  get  larger  their  rate  of 
coalescence  slows  down.  At  each  time  slice,  the  figures  plot  the 
number  of  agents  in  the  maximum  connected  component  in  the 
median  simulation  run.  The  largest  component  then  has  a  fifty 
percent  chance  of  being  bigger  (or  smaller)  than  the  one  plotted. 

Second,  the  density  of  the  agents  in  the  domain  is  very  im¬ 
portant  to  the  control.  This  is  not  that  surprising,  but  if  we  com¬ 
pare,  say,  the  lower  right  panel  in  the  three  figures,  an  interesting 
pattern  is  noted  regarding  the  jump  sizes.  In  Fig.  1  (showing  the 
smallest  region  and  thus  the  most  densely  packed  agents),  the 
larger  jump  sizes  cause  more  agents  to  cluster  more  quickly.  In 
the  middle  sized  region  (Fig.  2),  the  largest  sized  jump  performs 
the  worst,  and  in  Fig.  3,  the  two  largest  jump  sizes  perform  the 
worst.  As  the  jump  size  increases,  the  easier  it  is  for  clusters  of 
agents  to  break  apart.  In  the  denser  regions,  large  random  jumps 
break  clusters  apart,  but  chances  are  the  scattered  bits  are  close 
to  other  clusters. 

4.2  Normally  distributed  initial  conditions 

In  this  section  we  perform  simulated  experiments  similar  to 
the  previous  subsection,  except  that  the  initial  positions  are  nor¬ 


mally  distributed.  It  is  easy  to  imagine  mission  scenarios  where 
normally  distributed  initial  conditions  are  realistic,  e.g.,  agents 
dropped  from  a  helicopter.  Since  there  will  be  a  greater  amount 
of  initial  clustering  about  the  mean,  we  were  curious  to  see  if  the 
larger  random  jumps  could  force  consensus  in  larger  domains 
(quite  unlike  the  case  with  uniform  initial  conditions). 

As  above,  we  simulate  100  realizations  of  the  dynamics  for 
10,000  time  steps.  Agents  are  placed  in  a  square  100  by  100 
region  (the  largest  considered  above)  with  initial  normally  dis¬ 
tributed  initial  conditions.  The  mean  of  the  distribution  was  set 
to  be  the  center  of  the  square,  and  two  standard  deviations  were 
checked:  five  and  ten  times  the  sensing  radius.  In  none  of  the 
simulations  was  an  agent  assigned  an  initial  placement  outside 
of  the  domain.  Finally,  the  two  largest  jump  sizes  were  simu¬ 
lated:  R  =  0.8p  and  R  =  p.  Comparisons  with  the  uniform  initial 
conditions  are  given  in  Fig.  4.  As  before,  the  median  maximum 
component  size  of  100  simulation  runs  is  plotted. 

5  Planned  Simulations  and  Experiments 

For  verification  of  the  methods  described  in  this  paper,  an 
experimental  setup  with  real  robots  is  employed.  The  arena  for 
robot  exploration  is  an  area  that  is  roughly  5 m  x  3.5 m.  The 
agents  in  this  experiment  are  Create  mobile  robots  from  iRobot. 
A  motion  capture  video  system  comprised  of  twelve  OptiTrack 
V100R2  cameras  and  the  Tracking  Tools  API  from  NaturalPoint 
is  utilized  to  provide  an  indoor  GPS  capability.  Each  Create 
robot  is  assigned  a  unique  pattern  of  reflective  markers  and  a 
corresponding  “rigid  body”  definition  in  the  software,  allowing 
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Figure  4.  The  number  of  agents  in  the  maximal  connected  component 
for  simulations  in  the  largest  environment  (W  =  100)  with  normally  dis¬ 
tributed  initial  conditions.  The  medians  of  100  iterations  are  plotted.  Each 
panel  has  a  fixed  number  of  agents  and  varies  the  jump  size.  Nor¬ 
mally  distributed  initial  conditions  are  connected  with  dashes  (upper  tra¬ 
jectories)  and  uniformly  distributed  initial  conditions  are  connected  with 
dots  (lower  trajectories,  identical  to  Fig.  3  lower  panels).  Squares  are 
R  =  0.8p;  and  x’s  mark  R  =  p.  Upper  left:  N  =  40  and  stan.  dev.  of 
initial  position  is  5 R.  Upper  right:  N  =  40  and  stan.  dev.  of  initial  position 
is  10/?.  Lower  left:  N  =  100  and  stan.  dev.  of  initial  position  is  5 R. 
Lower  right:  N  =  100  and  stan.  dev.  of  initial  position  is  10 R. 

reasonably  robust  tracking  of  the  agents  at  100Hz. 

These  robots  are  clearly  governed  by  dynamics  that  are  more 
complex  than  single-integrator  particles.  Experiments  will  be 
conducted  utilizing  real  on-board  cameras  and  scanning  laser 
rangefinders  (see,  e.g.,  [50,51])  to  ascertain  the  limitations  of 
this  controller  under  these  dynamics,  as  well  as  to  understand  the 
rate  of  convergence  under  these  considerations.  Additionally,  the 
probabilistic  functions  proposed  in  this  paper  will  be  empirically 
verified.  Specifically,  the  shapes  of  the  sensing  functions  in  two 
(or  more)  dimensions  must  be  addressed. 


6  Conclusion 

Swarm  agreement  under  certain  probabilistic  connectivity 
limitations  has  been  proven  to  be  asymptotically  convergent  for 
bounded  spaces.  A  controller  has  been  developed  that  conforms 
to  the  assumptions  in  the  proof,  and  simulations  verify  the  con¬ 
vergence  under  such  a  controller.  Planned  experiments  will  ad¬ 
ditionally  validate  these  results. 
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A. 2. 14  Aggregation  and  Rendezvous  in  an  Unbounded  Do¬ 
main  without  a  Shared  Coordinate  System 

The  following  paper  (on  the  next  6  pages)  was  co-authored  by  Bruce  Rogers, 
Gregory  K.  Fricke,  and  Devendra  P.  Garg,  and  appeared  in  the  Proceedings  of 
the  50th  IEEE  Conference  on  Decision  and  Control  and  European  Control  Con¬ 
ference  on  pages  1437  1442,  published  and  presented  in  December  2011. 
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Abstract — Many  swarm  robotics  problems  focus  on  the  details 
of  robotic  dynamics,  while  ignoring  certain  other  practical 
issues  such  as  the  boundedness  of  the  exploration  space,  the 
probability  of  unsuccessful  communication  or  sensing  between 
agents,  knowledge  of  a  common  coordinate  system,  or  the  initial 
distribution  of  robots.  A  two-phase  control  is  devised  which, 
under  appropriate  individual  bounds,  almost  surely  leads  to 
successful  rendezvous.  We  examine  a  likely  and  practical  initial 
distribution,  which  leads  to  an  exploration  protocol  wherein  an 
agent  is  obligated  to  stay  within  a  certain  range  of  its  initial 
position.  Furthermore,  the  rate  of  convergence  is  primarily 
limited  to  the  velocity  limitations  of  the  agents.  We  solve 
the  aggregation  and  rendezvous  problem  with  random  initial 
conditions  in  an  arbitrary,  unbounded  2D  domain  without  an  a 
priori  share  coordinate  system.  We  examine  rates  of  successful 
aggregation  under  certain  initial  distributions,  and  rates  of 
convergence  under  uncertain  communication. 

I.  Introduction 

The  aggregation  and  rendezvous  problems  are  basic  hur¬ 
dles  in  swarm  robotics  and  distributed  control  [1] — [3].  In 
aggregation,  the  goal  is  to  develop  a  distributed  control 
that  allows  an  arbitrary  number  of  agents  to  act  together  to 
achieve  some  task  [4],  [5].  For  the  rendezvous  problem,  the 
goal  is  for  the  agents  to  converge  to  a  common  position. 
We  assume  that  the  agents  do  not  have  a  shared  coordinate 
system  and  operate  in  an  unbounded  two-dimensional  do¬ 
main.  Also,  we  assume  communication  constraints  between 
the  agents  that  are  realistic  for  current  technology.  However, 
the  control  developed  here  is  high  level  and  rather  abstract; 
we  are  not  concerned  with  issues  like  obstacle  avoidance, 
and  the  agents  are  assumed  to  be  point  masses  operating  in 
discrete  time. 

In  contrast  to  our  assumptions,  rendezvous  is  relatively 
simple  with  a  shared  coordinate  system  and  no  bounds  on 
communication.  Each  agent  can  broadcast  its  position  to  the 
others;  they  calculate  the  average  position  and  move  to  that 
point.  If  communication  is  only  local,  averaging  algorithms 
may  produce  many  clusters  of  agents  instead  of  a  single 
swarm.  Without  shared  coordinates,  robots  must  first  sense 
the  position  of  neighbors  to  develop  a  shared  map.  In  this 
paper,  we  explore  a  two  stage  control  protocol,  first  utilizing 
random  motion  that  allows  the  agents  to  develop  shared 
coordinates.  The  agents  then  use  local  averaging  to  solve 
the  rendezvous  problem  under  some  realistic  constraints. 

Communication  restrictions  between  agents  depend  on 
mission  specific  parameters:  the  sensors  of  the  robots,  the 
size  of  the  domain,  etc.  We  make  the  following  assumptions 
on  the  communication  abilities  of  the  agents  and  their  ability 
to  infer  others’  positions. 


•  No  shared  map:  Agents  must  meet  or  sense  each  other 
in  order  to  share  coordinates. 

•  Sensing  radius:  Each  robot  is  equipped  with  a  sensor, 
such  as  a  camera  or  laser  range  finder.  Most  commonly, 
such  sensors  have  a  sweep  of  less  than  360  degrees,  and 
exhibit  sensing  errors. 

•  Simultaneous  communication:  Each  agent  can  main¬ 
tain  at  most  k  simultaneous  communications.  For  ex¬ 
ample,  standard  Bluetooth  technology  allows  k  =  7 
simultaneous  communications.  It’s  important  to  note 
that  it  is  not  necessary  to  keep  the  same  k  partners  over 
time. 

•  Probabilistic  communication  radius:  Robots  have  a 
limited  energy  source  and  may  be  examining  an  area 
much  larger  than  the  power  of  the  antennae  can  cover.  In 
general,  the  probability  of  a  successful  communication 
decreases  with  distance.  In  what  follows,  we  simplify 
matters  in  one  of  two  ways:  by  assuming  a  fixed  proba¬ 
bility  p,i  of  the  failure  of  each  communication  attempt, 
or  by  assuming  that  the  probability  of  a  successful 
communication  decays  with  distance. 

We  develop  a  control  where  agents  first  move  randomly  to 
create  a  cache  of  comrades  with  common  coordinates.  Each 
agent  then  selects  some  small  number  (bounded  by  k)  of 
individuals  from  its  cache  and  moves  to  the  average  position 
of  those  individuals.  In  Section  II  we  give  a  random  motion 
paradigm  that  supplies  the  agents  with  common  coordinates 
and  provides  bounds  on  the  time  through  simulations.  In 
Section  III  we  discuss  the  global  dynamics  of  the  averaging 
portion  of  the  control.  Section  IV  proposes  future  simulations 
and  experiments  and  offers  some  concluding  remarks. 

II.  Finding  other  agents  in  the  field 

The  agents  in  our  system  wish  to  aggregate,  but  there  is 
one  major  impediment:  They  do  not  have  a  shared  coordinate 
system.  They  must  sense  each  other  in  the  field  before 
communicating  their  positions  to  each  other;  it  would  be 
meaningless  otherwise.  Upon  sensing  each  other,  each  agent 
records  the  sensed  position  of  the  newly  met  agent;  in  the 
following  communication  step,  the  agents  share  their  own 
beliefs  of  their  current  and  starting  positions  within  their 
own  coordinate  frame.  In  this  way,  the  coordinate  frames  of 
all  agents  become  united.  In  fact,  upon  complete  successful 
aggregation,  each  agent  has  a  complete  map  of  the  starting 
points  of  the  entire  agent  set  relative  to  its  own  coordinate 
frame. 
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Let  Xi{t)  £  R2  be  the  position  of  agent  i  at  time  t. 
If  there  are  N  agents,  the  state  space  of  the  system  is  a 
subset  of  R2xJv.  In  applications,  the  state  vector  may  include 
additional  coordinates,  e.g.  velocity  or  orientation;  however, 
without  loss  of  generality,  we  treat  aggregation  as  a  purely 
positional  problem. 

We  suppose  that  each  agent  has  some  open,  bounded, 
simply  connected  sensing  domain  Di(t)  C  R2,  and  that 
the  shape  of  the  Di  is  the  same  for  each  i  and  all  times. 
Since  our  agents  are  point  masses,  we  assume  without  loss 
of  generality  that  i  £  Di(t)  for  all  i  and  t.  We  also  suppose 
the  sensors  are  less  than  perfect;  there  is  a  failure  probability 
ps  at  each  sensing  opportunity.  If  a ;,(i)  £  Dj(t )  at  time 
t,  agent  j  senses  i’s  presence  with  probability  1  —  ps.  If 
Xi(t)  £  Dj(t)  and  Xj(t)  £  Dft )  at  some  time  t,  then 
agents  i  and  j  exchange  coordinate  systems  and  add  each 
other’s  communication  address  (e.g.,  IP  address)  to  their 
phone  book  with  probability  (1—  ps)2.  Also,  since  the  agents 
can  remember  all  agents  they  have  previously  encountered, 
two  agents  exchange  not  just  their  own  coordinates  upon 
sensing,  but  also  the  entire  contents  of  their  address  and 
phone  books. 

At  the  abstract  level,  we  need  a  protocol  for  agents  to  wan¬ 
der  about  and  find  each  other.  However,  if  the  environment  is 
unbounded  and  the  agents  move  randomly  (e.g..  Brownian  or 
Levy  motion),  it’s  possible  some  of  the  agents  will  wander 
away  from  the  rest.  To  prevent  that  possibility,  we  give 
the  agents  a  wandering  radius  Wr .  The  agents’  motions  are 
restricted  so  that  |[xj(0)  —  a:*  (t)  ||  <  Wr  for  all  i,  which  is 
a  simple  control  to  implement  since  each  agent  has  its  own 
relative  coordinate  system.  Let  B,  be  the  ball  of  radius  Wr 
centered  at  £j(0);  this  is  agent  i’s  wandering  domain.  Also, 
let  D  be  the  support  of  the  initial  distribution  of  agents,  so 
that  Xj( 0)  ePcR2  for  all  i. 

There  is  a  natural  graph  theoretic  exposition  of  the  control. 
For  basic  graph  theory  definitions,  see  [6]  for  a  classical 
treatment  or  [7] — [1 1]  for  an  engineering  viewpoint.  We 
consider  two  different  networks.  First,  at  each  time  t,  there 
is  a  sensing  network  Qt  =  (V,Et)  where  the  agents  are 
the  set  of  nodes  and  i  is  connected  to  j  if  and  only  if 
Xi(t)  £  Dj(t)  and  Xj(t)  £  Di(t).  Because  distant  agents  may 
communicate  if  they  have  met  previously,  the  union  of  all 
previous  connections  is  key  to  aggregation.  Then  for  a  fixed 
time  T  >  0,  set  Q(T)  =  (V,  \jJ  Et).  That  is,  i  ~  j  in  Q(T)  if 
*  ~  j  in  Qt  for  any  t  <T.  The  idea  of  accumulating  a  set  of 
changing  edges  is  present  in  [14]  and  [12];  however,  in  those 
works  the  topology  of  the  network  does  not  explicitly  depend 
on  the  geometry  of  the  state  space.  Here,  the  cumulative 
sensing  graph  Q  ( T )  depends  on  how  the  wandering  domains 
Bi  overlap  since  two  agents  i  and  j  can  sense  each  other 
only  if  the  distance  between  Bi  and  B;)  is  small  enough 
(and  certainly  if  Bt  D  Bj  ^  0). 

With  this  in  mind,  we  next  define  the  network  Qb  where 
the  nodes  are  the  sets  Bi,  and  two  wandering  domains  are 
connected  if  BiCiBj  f  0.  Qb  is  sometimes  called  the  nerve  of 
the  cover  (J  B, .  The  following  proposition  essentially  follows 


from  the  definitions. 

Proposition  1:  Suppose  the  dynamics  of  the  individual 
agents  are  given  by  independent  stationary  ergodic  stochastic 
processes  such  that  the  support  of  the  stationary  distribution 
of  Xi(t)  is  Bj.  Then  if  the  network  of  the  wandering  domains 
Qb  is  connected,  for  any  e  >  0  there  is  a  (random)  finite  time 
r  such  that  the  probability  the  cumulative  sensing  network 
Q  (r)  is  connected  is  1  —  e. 

Proof:  It  suffices  to  show  that  two  agents  in  intersecting 
regions  will  sense  each  other  in  finite  time.  This  is  bound 
to  happen  by  the  recurrence  property  of  ergodicity.  Given 
any  disc  in  the  intersection  of  the  regions,  there  is  a  finite 
time  where  both  agents  will  be  in  any  disc.  Make  the  disc 
sufficiently  small  so  that  the  agents  must  sense  each  other.  If 
ps  >  0,  the  probability  of  sensing  goes  1  as  the  two  agents 
recurrently  meet  in  the  disc.  ■ 

The  “almost  surely”  in  the  proposition  is  with  respect  to 
the  stationary  measure.  Also,  the  wandering  domains  do  not 
need  to  be  discs,  but  we  use  discs  below  because  the  agents 
have  random  initial  orientations.  In  what  follows,  each  agent 
i  picks  a  point  uniformly  at  random  from  B,  and  moves 
there,  and  then  repeats.  These  dynamics  clearly  meet  the 
hypotheses  of  the  Proposition.  In  the  next  subsection,  we 
show  by  simulation  that  connection  can  be  achieved  in  a 
reasonable  amount  of  time. 

Once  Q  (t)  is  connected,  the  agents  have  essentially  solved 
the  aggregation  problem.  There  is  then  a  chain  of  contacts 
between  any  two  agents,  and  thus  any  two  agents  can 
pass  communications  through  that  chain.  Then  every  agent’s 
address  book  will  contain  the  phone  number  and  coordinate 
system  of  every  other  agent.  One  issue  is  to  develop  a 
communication  protocol  so  that  the  agents’  address  books 
are  built  up  over  time,  and  to  minimize  the  adverse  effects 
of  call  drops.  Another  issue  not  addressed  in  this  paper  is 
the  required  self-localization  of  each  agent  relative  to  its  own 
coordinate  frame;  there  are  many  methods  in  the  literature 
addressing  this  practicality  through  odometry  calibration  and 
estimation  [15],  SLAM  methods  [16],  [17],  or  (perhaps  most 
relevant  to  this  discussion)  relative  localization  [18],  [19]. 

A.  Simulations  for  aggregation  time 

In  this  section  we  wish  to  determine  how  long  it  will  take 
to  connect  the  network  Q  (t)  with  a  realistic  number  of  agents 
deployed  in  a  realistic  initial  domain  D.  We  focus  on  a  small 
team  of  agents,  10  <  N  <  20  with  initial  positions  drawn 
from  a  two-dimensional  normal  distribution.  For  TV  — >  oo, 
the  connectivity  of  Qb  is  well  understood  [13],  but  very  large 
systems  of  robots  are  not  feasible  at  this  time. 

We  first  need  to  determine  constraints  on  the  time  un¬ 
til  connectivity  in  best  conditions  before  considering  less 
ideal  sensing.  In  the  simulations,  we  assume  perfect,  omni¬ 
directional  sensing.  That  is,  each  agent  has  a  sensing  ra¬ 
dius  p,  and  two  agents  sense  each  other  at  time  t  if 
|| cc* (t)  —  Xj(t)  |  <  p.  This  represents  a  best  case  scenario 

of  the  discussion  above.  Without  loss  of  generality,  set  the 
sensing  radius  of  the  agents  to  unity,  p  =  1 . 
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Suppose  the  initial  position  of  each  agent  is  drawn  from 
a  normal  distribution  with  mean  at  the  origin.  The  standard 
deviation  in  the  x-coordinate  is  five  times  the  sensing  radius, 
and  the  standard  deviation  in  the  y-coordinate  is  the  same 
as  the  sensing  radius.  (The  covariance  between  x  and  y 
is  0).  Such  a  distribution  may  be  realistic  in  a  scenario 
where  the  agents  are,  say,  ground  robots  deployed  by  an 
aircraft  in  flight;  a  long,  thin  initial  distribution  makes  sense. 
Under  these  chosen  “mission  parameters”,  the  wandering 
radius  Wr  of  each  agent  is  set  to  14.  This  value  was 
chosen  so  that  the  wandering  domain  network  ( Qb  above) 
is  very  likely  to  be  connected.  In  simulations,  the  wandering 
domain  networks  were  not  connected  only  16  times  out  of 
11,000  instantiations.  Additional  experiments  were  run  for 
Wr  =  10,..., 20.  Clearly,  the  larger  the  radius,  the  more 
likely  Qb  is  to  be  connected,  but  also  the  longer  it  takes  for 
the  cumulative  sensing  network  to  become  connected  because 
each  agent  has  to  search  a  larger  area. 

With  mission  parameters  defined,  we  implement  a  simple 
agent  dynamics.  For  each  t  >  0,  Xi(t)  is  chosen  uniformly 
at  random  from  agent  i’s  wandering  domain.  The  index  t 
does  not  represent  time  in  the  sense  of  a  ticking  clock, 
but  rather  the  number  of  actions  the  agents  perform.  These 
dynamics  are  chosen  because  we  want  the  agents  to  aggregate 
as  quickly  as  possible,  but  they  can  sense  each  other  only 
if  they  are  near  enough  to  one  another;  independent  draws 
from  a  uniform  distribution  mix  the  agents  quickly.  Also, 
for  two  agents  it  is  easy  compute  the  mean  and  variance 
of  the  sensing  time.1  For  more  than  two  agents,  there  are 
dependencies  that  make  calculating  the  expected  connection 
time  of  Q(t)  difficult. 

Each  agent  moves  from  Xi(t)  to  Xi(t+1)  in  our  simulation 
by  utilizing  a  2m,-ordcr  unicycle  model,  i.e.,  one  that  is 
appropriate  for  control  of  a  two-wheeled  differential-drive 
mobile  robot.  At  each  simulation  step,  the  proximity  to  other 
agents  is  evaluated.  If  one  or  more  other  agents  is  within  the 
defined  sensing  range  p,  the  agents  are  said  to  have  met. 

Figure  1  shows  simulation  results  for  N  =  10, 11, . . . ,  20. 
For  the  initial  distribution  given  above  and  each  value  of  N, 
we  calculate  the  first  time  t  when  the  cumulative  sensing 
network  Q(t)  is  connected.  The  times  t  are  random,  so 
we  run  the  simulation  1000  iterations  for  each  value  of  N 
with  fresh  initial  conditions  every  iteration.  The  figure  shows 
the  50t?i,  75t?i,  90i?i  and  99th  percentiles  of  the  simulated 
aggregation  times,  as  well  as  the  worst  case. 

The  y-axis  in  Figure  1  measures  time  in  number  of  random 
positions  explored.  We  see  little  improvement  in  performance 
as  N  increases  from  10  to  20,  and  the  times  are  very  tightly 
distributed  about  the  median  (bottom  line).  Very  rarely, 
the  agents  take  longer  to  aggregate;  see  the  crossed  line 
corresponding  to  the  99th  percentile.  This  happens  when 
the  wandering  domains  have  very  small  overlap.  The  results 
are  promising:  in  90  percent  of  the  1000  cases,  only  about 

'The  sensing  time  is  a  Bernoulli  random  variable  with  probability 
proportional  to  the  areas  of  overlap. 
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Fig.  1.  Time  until  aggregation  for  N  =  10  to  20  agents.  The  time  is 
random,  so  1000  iterations  are  simulated  for  each  value  of  N,  and  percentiles 
are  reported.  The  time  units  are  number  of  positions  explored. 


10  positions  are  necessary  for  the  communication  graph  to 
become  connected. 


III.  Global  dynamics  of  averaging  protocol 


We  now  move  to  the  rendezvous  problem  with  commu¬ 
nication  constraints.  Once  the  agents  have  aggregated  and 
can  share  coordinates,  they  could  broadcast  their  positions 
and  rendezvous  at  the  mean.  However,  broadcasting  does  not 
necessarily  scale  with  the  number  of  agents,  whereas  small- 
scale  directed  messaging  does  [20],  If,  for  example,  there 
are  dozens  or  hundreds  of  agents,  each  agent  would  have  to 
receive  and  process  all  of  those  broadcast  messages,  or  there 
would  need  to  be  a  protocol  to  somehow  synchronize  the 
messages.  Alternatively,  if  each  agent  only  sends  directed 
messages  to  a  small  number  of  other  agents,  results  from 
random  graph  theory  confirm  that  the  typical  number  of 
incoming  requests  will  be  closely  related  to  that  small 
number;  thus  most  agents  will  have  a  tenable  number  of 
requests  to  which  to  respond.  As  noted  in  the  introduction  we 
assume  for  practicality  reasons  that  at  most  k  simultaneous 
communications  are  possible. 

We  wish  to  keep  communications  to  a  minimum  in  order 
to  minimize  the  detrimental  effects  of  dropped  calls.  Suppose 
each  agent  i  nominates  exactly  two  neighbors  i  \  and  i-i-  In 
this  section  we  consider  the  global  dynamics  from  the  local 
updating 


Xi(t  +  1) 
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So  at  each  time  step,  agent  i  goes  to  the  mean  of  its  two 
neighbors’  previous  positions.  In  the  limit  as  N  — >  oo,  these 
dynamics  are  known  to  contract  the  unit  disc,  but  the  situation 
is  less  clear  for  a  fixed  finite  number  of  agents. 
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A  key  practical  concern  is  how  the  agents  choose  their 
two  neighbors.  The  mathematical  ideal  is  for  each  agent  to 
choose  from  the  others  uniformly  without  replacement.  If  the 
agents  have  aggregated  as  in  the  previous  section,  they  have 
access  to  the  phone  numbers  of  all  the  other  agents  and  can 
choose  from  them  uniformly.  Even  in  the  ideal  case,  there 
is  no  mathematical  guarantee  that  the  positions  of  the  agents 
will  converge  to  a  common  value.  However,  for  large  N  with 
initial  positions  randomly  distributed  in  a  convex  subset  of 
R2,  there  are  several  heuristic  reasons  to  believe  the  dynamics 
in  (1)  will  cause  the  agents  to  converge.  First,  the  dynamics 
define  a  contraction,  so  the  most  distant  agents  are  bound 
to  come  closer  together.  Second,  since  each  agent  nominates 
exactly  two  neighbors,  random  graph  theory  tells  us  that  the 
network  of  connections  between  agents  will  have  a  unique 
giant  component  for  large  N.  That  is,  the  vast  majority  of 
agents  will  be  in  the  same  communication  class,  so  under 
(1)  the  positions  of  these  agents  will  converge  to  the  same 
value. 

Finally,  even  if  the  communication  network  has  many 
separate  communication  classes,  the  initial  conditions  are 
random,  so  the  law  of  large  numbers  says  that  each  com¬ 
munication  class  will  converge  to  a  point  near  the  mean 
of  the  distribution.  These  observations  fall  well  short  of 
rigorous  proof,  but  they  are  played  out  in  simulation.  We 
ran  1000  realizations  of  the  dynamics  with  N  =  10, . . . ,  20 
and  initial  conditions  uniformly  distributed  in  a  rectangle  of 
size  W  =  100  on  each  side.  These  initial  conditions  are 
more  dispersed  than  the  more  realistic  distribution  described 
in  the  previous  section,  so  convergence  should  be  less  likely 
and  slower.  After  25  time  steps  of  the  dynamics  from  (1)  we 
measured  whether  or  not  all  the  agents  settled  within  a  disc 
of  diameter  1.  If  not,  we  consider  this  a  failure  and  record 
the  failure  rates  in  the  first  row  of  Table  I.  For  the  sake  of 
space,  we  record  just  the  even  numbers  of  agents.  The  others 
are  similar. 

TABLE  I 

Number  of  Failures  in  1000  Sim  Runs. 


Number  of  agents 

to 

12 

14 

16 

18 

20 

Case  1 

29 

32 

25 

36 

21 

16 

Case  2 

0 

4 

8 

2 

0 

3 

The  second  row  shows  the  same  measurement,  but  with  a 
twist  on  the  dynamics.  Each  agent  picks  two  agents  randomly 
and  follows  (1)  for  10  time  steps.  Then  each  agent  picks  a 
fresh  pair  of  neighbors  and  follows  (1)  for  10  more  time 
steps.  In  the  first  case,  agents  in  over  95  percent  of  the 
simulations  converge  to  within  1  unit  of  each  other  after 
25  time  steps.  In  the  second  case,  agents  in  more  than  99 
percent  of  the  simulations  converge  to  within  1  unit  of  each 
other  after  just  20  time  steps.  As  in  the  previous  section,  time 
is  measured  in  calculations  performed,  and  the  convergence 
is  fast  enough  that  it  is  practically  limited  only  by  the  speed 
of  the  robot. 

Even  when  the  agents  do  not  converge  to  a  single  point 


within  25  times  steps  (reported  as  a  failure  in  the  table), 
they  come  together  into  several  clusters  that  lie  on  a  common 
line  in  the  plane.  Even  in  the  cases  of  “failure”  the  maximum 
distance  between  the  agents  is  less  than  3.  Also,  for  N  =  250 
or  500  we  found  that  the  dynamics  converge  to  within  a  disc 
of  diameter  1  in  every  case  out  of  2000. 

A.  Probabilistic  communication:  Fixed  probability 

We  now  consider  the  case  where  successful  communica¬ 
tion  during  the  rendezvous  phase  is  not  guaranteed.  Intu¬ 
itively,  this  should  decrease  the  rate  of  convergence  (and 
perhaps  the  likelihood  of  success).  To  investigate  this,  we 
modified  the  rendezvous  simulation  such  that  with  probability 
Pd  a  communication  attempt  at  a  given  time-step  will  be 
unsuccessful.  In  this  case,  a  “communication  attempt”  is  the 
request  from  one  agent  to  one  of  its  two  nominated  neighbors 
for  their  current  positions.  An  unsuccessful  attempt  means 
that  the  requested  information  is  not  returned.  If  only  one 
such  request  is  successful,  the  requesting  agent  will  move  to 
the  average  of  its  current  position  and  the  responding  agent’s 
position.  If  neither  request  is  successful,  the  requesting  agent 
does  not  move.  (Note  that  the  responding  agents  do  not 
take  this  communication  into  account  in  generating  their  own 
moves.) 

We  ran  three  sets  of  simulations  of  100  mns  each  for  values 
of  pd  =  {0.1, 0.5, 0.9}  and  N  =  20,  and  the  results  are 
shown  in  Figure  2.  The  y-axis  of  the  figure  is  the  maximum 
separation  between  any  two  agents.  The  plotted  results  give 
the  median  distances  at  the  given  time-step  for  each  of  the 
three  values  of  pd .  Additionally,  for  the  pd  =  0.9  case,  the 
25th  and  75th  percentile  values  are  shown. 

Astoundingly,  even  with  a  failure  rate  of  pd  =  0.5  (that  is, 
one  out  of  two  communication  attempts  is  unsuccessful),  the 
median  max-distance  is  effectively  zero  before  the  30th  time- 
step.  Compare  this  to  the  case  of  the  previous  subsection, 
i.e.,  perfect  communication,  where  we  mark  the  rendezvous 
successful  if  the  max  distance  is  <  1  at  completion  of  the 
25th  time-step.  As  expected,  the  rendezvous  requires  more 
time  under  decreased  probability  of  successful  communica¬ 
tion;  unexpectedly,  the  effect  seems  to  be  quite  mild.  Even  for 
the  case  of  pd  =  0.9,  the  rendezvous  is  successful.  The  rate 
is  clearly  much  slower  in  this  case,  but  successful  rendezvous 
can  still  be  expected.  These  results  illustrate  a  lower  bound 
for  the  time  required  for  rendezvous  under  very  ineffective 
communication. 

B.  Probabilistic  communication:  Range-dependence 

We  continue  this  investigation  by  making  a  more  realistic 
assumption  regarding  the  communication:  the  probability  of 
successful  communication  decreases  with  increased  range. 
We  know  well  that  the  power  level  of  radio  frequency  signals 
drops  off  as  the  square  of  the  distance,  so  we  might  similarly 
expect  that  digital  signals  are  less  likely  to  be  successfully 
received  at  greater  distances. 

With  this  motivation,  we  define  the  probability  of  success¬ 
ful  communication  as  a  function  of  the  distance  d  between 


1440 


120 


100 


+  Drop  prob  =  0.9 

o  Drop  prob  =  0.5 

*  Drop  prob  =  0.1 


0) 

o 

c 

aJ 


s 

5 


60  70  80  90  100 


Fig.  2.  Convergence  rates  subjected  to  various  probabilities  of  successful 
communication. 


any  two  agents  attempting  to  communicate.  For  simplicity, 
we  choose  the  sigmoid  function  shown  in  (2),  where  Cr  is 
the  communication  radius  and  a  is  a  shaping  parameter  that 
defines  the  sharpness  of  the  dropoff. 

Pd(d)  —  1  —  1  ea/2-d*a/Cr 

Choosing  W  =  100  (as  previously),  Cr  =  0.75 W  and  a  =  5, 
we  have  pd(0)  =  0.924  and  pd(V2W)  =  0.000979  (where 
y/2 W  is  the  max  separation  any  two  agents  could  experience 
in  this  scenario). 

We  repeated  the  simulations  for  convergence  using  this 
function  for  communication  probability.  In  10,000  iterations, 
only  two  instantiations  failed  to  converge  within  100  time 
steps.  The  results  of  these  simulations  are  given  in  Fig.  3 
with  the  crossed  points  indicating  the  median  max-separation 
between  any  two  agents,  and  the  dotted  lines  indicating  the 
10th  and  90*/l  percentiles.  From  the  figure,  it  is  clear  that 
the  convergence  rates  under  this  assumed  function  maintain 
the  convergence  rates  at  roughly  those  of  the  case  where 
all  communication  is  subjected  to  a  50%  probability  of 
failure  (regardless  of  distance).  Furthermore  the  results  are 
very  promising  in  the  very  tight  distribution  of  convergence 
rate.  It  is  likely  that  the  shape  of  this  convergence  is  highly 
dependent  on  the  shape  of  the  communication  function  (2), 
specifically  on  the  sharpness  of  the  cutoff. 

IV.  Discussion  and  future  directions 

We  have  considered  distributed  controls  that  are  functions 
only  of  an  agent’s  position.  Real  robots  move  in  continuous 
time  and  continuous  space,  have  limitations  on  their  ability  to 
dead-reckon  their  own  positions,  and  have  limitations  in  their 
ability  to  consistently  and  accurately  measure  the  positions 
of  other  agents.  In  the  future  we  seek  to  further  refine  our 
simulations  to  address  these  issues  in  more  realistic  and 
practical  terms. 
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Fig.  3.  Convergence  rates  subjected  to  the  probability  of  successful 
communication  defined  as  a  function  of  distance. 


A  serious  challenge  is  to  rendezvous  a  relatively  small 
number  of  agents  in  a  perhaps  vast  environment.  We  have 
given  a  two  stage  process  where  agents  first  find  a  shared 
coordinate  systems  and  are  able  to  communicate  with  each 
other.  Then,  each  agent  moves  directly  between  two  other 
agents  in  order  to  rendezvous  at  a  common  point.  We  choose 
only  two  others  in  order  to  limit  dropped  communications 
due  to  technology  and  protocol  constraints. 

We  have  been  silent  on  how  the  agents  know  when  to 
switch  from  one  dynamics  protocol  to  the  other.  One  way  is 
to  suppose  the  agents  know  how  many  other  agents  there 
are,  so  each  one  has  an  address  book  with  N  —  1  slots. 
When  an  agent’s  address  book  becomes  full,  it  votes  to  switch 
from  the  random  dynamics  of  Section  II  to  the  deterministic 
dynamics  of  Section  III.  Once  the  vote  is  unanimous,  the 
switch  is  made.  This,  however,  adds  requirements  to  the 
communication  protocol. 

A  simple  alternative  method  of  deciding  when  to  switch 
to  the  rendezvous  protocol  is  to  capitalize  on  the  results  of 
Section  II.  Specifically,  we  see  that,  for  10  or  more  agents 
utilizing  this  protocol,  arbitrarily  setting  the  time-to-switch 
to,  say,  t  =  100  means  that  with  high  confidence  the  address 
books  of  all  agents  will  be  full.  As  the  number  of  agents 
increases  within  the  same  distribution  (i.e.,  the  initial  density 
increases)  the  number  of  time-steps  additionally  decreases. 
Clearly  then,  and  especially  within  the  mindset  of  swarming 
robotics  wherein  exactness  is  of  less  importance  than  gen¬ 
eralized  robustness,  a  time  could  be  chosen  within  which 
some  minimum  number  of  agents  (with  a  perhaps  larger  set 
of  deployed  agents)  aggregate  and  rendezvous  with  a  dictated 
likelihood  appropriate  for  the  given  mission. 
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A. 2. 15  Swarm  Formations  under  Nonholonomic  and  Nu- 
merosity  Constraints 

The  following  paper  (on  the  next  7  pages)  was  co-authored  by  Gregory  K.  Fricke, 
Kevin  M.  Lieberman,  and  Devendra  P.  Garg,  and  appeared  in  the  Proceedings 
of  the  5th  ASME  Dynamic  Systems  and  Control  Conference,  published  and 
presented  in  October  2012. 
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ABSTRACT 

In  this  paper,  we  propose  a  fully  distributed,  scalable  method 
of  controlling  agents  with  nonholonomic  constraints  using  a 
Morse  potential  function.  This  method  successfully  controls  a 
swarm  of  differential-drive  (unicycle-type)  agents  to  stable  and 
predictable  formations  whose  structures  are  not  defined  a  priori. 
The  system  achieves  a  stable,  minimal  energy  state. 

We  consider  the  effect  of  numerosity  constraints,  as  ob¬ 
served  in  birds  and  fish  in  their  shoaling  and  flocking  behav¬ 
ior  as  a  mechanism  of  reducing  complexity,  in  the  interest  of 
achieving  fully  distributed  control  over  a  swarm  of  any  size.  The 
application  of  numerosity  constraints  to  a  swarm  system  allows 
the  swarm  to  grow  without  bound  and  with  no  increase  in  re¬ 
quired  processing  capability  of  the  individual  agents.  We  explore 
this  parameter  as  a  method  of  minimizing  processing  and  storage 
requirements  while  still  achieving  the  qualitative  swarm  perfor¬ 
mance.  Results  from  simulations  are  given  for  swarms  ranging 
in  size  over  N={6,. . .  ,100}  acting  under  our  proposed  controller 
as  applied  to  differential-drive  (unicycle-type)  robots. 

1  INTRODUCTION 

Distributed  control  of  agents  in  a  swarm  is  desirable  for  high 
robustness  in  dynamic  environments  and  against  individual  agent 
failures.  In  many  applications  that  employ  multiple  autonomous 
agents,  the  ability  of  the  agents  to  move  in  formation  is  highly 
desirable,  even  if  the  formation  structure  is  not  explicitly  given. 
This  is  certainly  true  for  environmental  exploration  and  coverage 
applications  (see,  e.g.,  [  1 — 4]),  or  escort  applications  [5,6].  Sta¬ 
ble  formation  control  can  reduce  the  occurrence  of  agents  getting 
“lost”  or  separated  from  the  group.  In  this  paper,  we  examine  the 
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effect  of  “numerosity  constraints”  on  the  dynamic  formation  be¬ 
havior  of  a  group  of  differential-drive  ground  robots  under  fully 
distributed  control. 

The  concept  of  a  “numerosity  constraint”  [7, 8]  is  important 
to  consider;  evidence  from  natural  systems  has  shown  that  both 
birds  [9]  and  fish  [10, 11]  exhibit  such  constraints  in  their  shoal¬ 
ing  and  flocking  behavior.  Clearly,  in  a  murmuration  of  starlings 
each  individual  cannot  keep  track  of  the  entire  flock  that  num¬ 
bers  in  the  thousands,  nor  can  an  individual  fish  keep  track  of 
the  entirety  of  its  school.  Each  individual  is  thus  only  aware  of 
a  finite  number  of  its  nearest  neighbors,  making  its  own  motive 
decisions  based  only  on  this  input.  This  ability  is  clearly  desir¬ 
able  in  a  scalable  application  of  multiple  mobile  robots,  allowing 
each  individual  to  bound  its  input  requirements  (and  accordingly, 
its  memory  and  processing  requirements  as  well  as  its  communi¬ 
cation  or  sensing  requirements)  regardless  of  the  overall  size  of 
the  swarm. 

1.1  Background 

Formation  control  of  a  group  of  mobile  robotic  agents  con¬ 
tinues  to  advance.  Research  in  formation  control  has  been  widely 
reported  by,  among  many  others,  Lewis  and  Tan  [12],  Balch  and 
Arkin  [13],  Desai  et  al  [14,  15],  Sugar  and  Kumar  [16],  Yam- 
ague  hi  and  Burdick  [17],  Egerstedt  and  Hu  [18],  Fierro  et  al  [19], 
Olfati-Saber,  Fax  and  Murray  [20-24],  Lawton  et  al  [25],  Tanner 
et  al  [26,27],  Mai  and  Lian  [28],  and  Viguria  and  Howard  [29], 
In  many  cases,  roboticists  look  to  nature  for  inspiration  (see, 
e.g.,  [30-32]).  Various  approaches  are  employed  in  controlling 
formations,  from  those  that  are  rigidly  defined  (i.e.,  all  inter¬ 
agent  distances  and/or  angles  are  specified  a  priori )  to  those  that 
exploit  inter-agent  distances  or  configurations  that  naturally  lead 
to  agreement  (i.e.,  potential  functions). 
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A  major  advantage  of  potential  function-based  approaches  is 
that  the  individual  agents  don’t  require  “labeling”,  i.e.,  the  agents 
are  identity  agnostic.  This  greatly  reduces  the  requisite  sensing 
ability  as  well  as  processing  and  memory  requirements,  and  may 
remove  communication  requirements  (for  control)  altogether. 

Another  significant  (and  in  certain  ways  more  desirable)  ad¬ 
vantage  is  robustness  in  the  face  of  agent  failure.  If  some  in¬ 
dividual  agent  in  a  swarm  of  N  agents  is  somehow  compro¬ 
mised  (i.e.,  disabled,  damaged,  or  outright  removed)  the  remain¬ 
ing  agents  simply  achieve  (smoothly)  the  appropriate  minimum- 
energy  state  for  (N  —  1)  agents. 

The  resultant  structures  arise  naturally  from  the  minimum- 
energy  framework.  This  may  be  disadvantageous  in  certain  cases 
where  one  or  more  agents  have  differing  capabilities,  and  where 
certain  agents  should  occupy  specific  positions  within  the  forma¬ 
tion.  This  limitation  can  be  overcome  if  those  specialized  agents 
use  slightly  different  shaping  parameters  in  their  own  potential 
functions. 


agent  j  to  agent  i  is  dif  U  (i)  is  the  total  potential  energy  between 
agent  i  and  its  neighbors  j  £  {J\[L,  separated  by  Euclidean  distances 
dij  =  1 1  1 1 2 -  The  attractive  and  repulsive  “strengths”  are,  respec¬ 

tively,  Cfl  and  Cr,  and  the  attractive  and  repulsive  length  scales 
are,  respectively,  la  and  lr.  These  shaping  constants  are  restricted 
to  positive  real  numbers,  and  la  ^  l,-. 


U{i)  =  Y,[-Cae~dii/la  +Cre~d‘J/lr]  (1) 

m 

The  minimum  energy  for  a  pair  of  agents  under  this  potential 
function  occurs  at  separation  Jm!n,  defined  in  Eq.  (2). 
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1.2  Outline 

The  remainder  of  this  paper  is  organized  as  follows.  Sec¬ 
tion  2  defines  the  problem  and  the  proposed  solution.  Sec¬ 
tion  3  describes  the  simulation-based  validation  of  our  proposed 
method,  and  presents  results  with  and  without  numerosity  con¬ 
straints  relative  to  predicted  minimum  values.  Section  4  provides 
discussion  of  the  limitations,  future  work,  and  practical  consid¬ 
erations,  and  in  Section  5  the  paper  is  concluded. 


2  PROBLEM  STATEMENT 

Given  a  set  of  N  agents  in  an  appropriately  compact  space 
and  initially  distributed  such  that  the  overall  network  is  con¬ 
nected,  identify  a  distributed  controller: 

-  whose  implementation  is  entirely  scalable; 

-  that  results  in  a  stable  formation  for  coverage; 

-  that  is  feasible  for  agents  with  nonholonomic  constraints; 

-  that  exhibits  smooth  behavior,  and; 

-  that  is  robust  to  sensing  or  communication  uncertainty. 

The  assumption  that  the  swarm  of  agents  is  initially  con¬ 
nected  is  an  important  one.  For  methods  to  ensure  that  agents 
are  deployed  and  can  find  each  other,  the  reader  is  directed  to 
[33,34], 

2.1  Proposed  Solution 

We  propose  a  distributed  controller  for  a  swarm  of  N 
differential-drive  mobile  robots  moving  under  a  potential  func¬ 
tion  with  a  numerosity  constraint  N  such  that  1  <  N  <  (A  —  1). 
(The  only  case  where  N  =  1  is  feasible  is  for  N  =  2.)  The  nearest 
N  agents  to  agent  i  are  defined  as  the  neighbors  of  agent  i,  and 
make  up  the  set 

We  start  with  a  Morse  potential  function  as  implemented  by 
Chuang  et  al  [35]  as  given  in  Eq.  (1).  The  vector  distance  from 


The  shaping  constants  are  restricted  as  follows: 


hi  d1  l  r 
la  h- 


h'  d'  r  :>  h  (  'a 

1  iX  V  <  :  h  C  'a 


The  controller  for  agent  i  uses  a  typical  gradient-descent 
method  for  potential  function  controllers,  as  shown  in  Eq.  (3), 
where  Fj  is  the  vector  of  forces  and  is  of  the  same  dimensionality 
as  dtj. 


Fi  =  -VU(i) 


(3) 


In  the  simple  case  of  an  agent  capable  of  motion  and  acceler¬ 
ation  in  any  direction,  this  potential  function  easily  accomplishes 
a  stable  tiling  of  agents.  This  type  of  motion  is  rarely  realizable 
in  practice,  however,  so  we  must  consider  a  method  to  re-orient 
the  agents  to  overcome  relevant  nonholonomic  constraints.  The 
dynamic  model  we  consider  in  this  paper  is  that  of  the  ubiquitous 
differential -drive  robot.  The  state  of  agent  i  is  defined  as: 


A,  = 


Xi 

yi 

e 

Vi 

CO, 


(4) 


where  the  pair  (x,-,y,)  represents  the  world-coordinates  of  the 
agent  in  the  plane,  0,  is  the  agent’s  heading  with  0,  =  0  corre¬ 
sponding  to  the  i-axis,  and  v,  and  CO,  are,  respectively,  the  linear 
and  angular  speeds. 
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As  the  system’s  potential  energy  is  applied  to  the  agents  as 
force  they  will  accelerate,  converting  the  potential  energy  into  ki¬ 
netic  energy.  In  order  to  allow  the  system  to  dissipate  the  energy, 
cv  and  Co,  are  included  as  non-conservative  damping  terms.  The 
control  dynamics  are  then  given  by  Eq.  (5)  and  Eq.  (6),  where  a,- 
is  the  linear  acceleration  and  a,  is  the  angular  acceleration. 


U{i)=Ud(i)  +  Ue{i) 

Ud{i )  =  £  \-Cae~d^  +Cre-di^] 

FH.  (8) 

t/9(0  =  CpWFiUl  -cos0rfp(O)  +  \cQ{0,  -0,)2 


Xi  =  f(Xi)+g(Xi)  (5) 


m) 


v,cos(0,) 

0 

v,sin(0,) 

0 

CO, 

;  g(Xi)  = 

0 

0 

Cli  -  V,'Cv 

0 

OC/  CG/Cq) 

(6) 


Correspondingly,  the  controller  in  Eq.  (3)  is  modified  as  fol¬ 
lows.  First,  F(i)  =  —VUd(i).  Another  feedback  term  is  added 
in  the  form  of  torque.  (As  above,  the  angular  acceleration  is 
a,-  =  I;-  x,,  but  we  set  the  inertia  I,  equal  to  identity.)  The  resul¬ 
tant  angular  acceleration,  a,-,  is  defined  in  Eq.  (9). 


a,  =CP  [Fy  (i)  cos  0,-  -  Fx  (i )  sin  0,-] 

+Ce(4(£  ei0o/fv]-eo 

FH. 


(9) 


We  modify  Eq.  (3)  in  acknowledgment  of  the  lack  of  force- 
authority  in  the  direction  perpendicular  to  0  due  to  the  nonholo- 
nomic  constraint.  Instead,  we  set  the  control  force  Ft  as  the  scalar 
product  F,  =  ( Fj ,  0,),  i.e.,  the  projection  of  the  force  vector  F\  onto 
the  (unit-length)  heading  of  the  robot.  For  the  case  of  agents  re¬ 
stricted  to  the  plane,  F  =  [Fx,Fy]T,  and  F  =  ||F||2.  The  linear 
acceleration  is,  of  course,  a,-  =  Fj/nij  where  nij  is  the  mass  of 
agent  i;  without  loss  of  generality,  we  set  the  mass  of  the  agents 
m j  =  1. 


a,-  =  Fx(i)  cos  0,-  +  Fy(i)  sin  0,  (7) 

This  projection  of  force  appropriately  addresses  the  non- 
holonomic  constraint  inasmuch  as  it  prevents  the  agent  from 
moving  in  a  manner  that  is  not  physically  possible.  However,  the 
agent  must  have  some  ability  to  change  its  orientation  to  move  in 
the  appropriate  direction  to  its  ideal  position  relative  to  its  neigh¬ 
bors. 

The  additional  control  required  for  alignment  is  imple¬ 
mented  as  a  combination  of  state-averaging  with  neighbors  and 
an  “aligning  torque”  resulting  from  the  artificial  forces  acting  on 
the  robot.  The  state-averaging  term  is  akin  to  torsional  springs 
between  agent  i  and  its  neighbors,  whereas  the  aligning  torque 
is  analogous  to  that  acting  on  a  electric  dipole  in  the  presence 
of  an  electric  field.  In  this  way,  we  extend  the  equation  of  en¬ 
ergy  to  include  this  angular  information,  as  shown  in  Eq.  (8) 
with  the  addition  of  two  terms.  The  first  term  reflects  the  en¬ 
ergy  of  the  agent’s  orientation  with  respect  to  the  potential  field 
Qdp(i)  =  arctan(Fv,-,Fv  i)  —  0,,  and  Cp  is  the  strength  of  the  po¬ 
lar  moment  (which  is  used  as  a  tuning  parameter).  The  second 
term  captures  the  error  between  the  orientation  of  agent  i  and  the 
average  orientation,  0,,  of  its  neighboring  agents  j. 


3  SIMULATION  OF  SYSTEM  ENERGIES 

In  this  section  we  examine  how  well  the  system  approaches 
the  minimum  energy  state.  We  model  teams  of  agents  with  sizes 
of  N  =  6,7, . . .  ,20,25, . . .  ,50, 100.  The  initial  positions  for  each 
simulation  instance  are  drawn  from  a  two-dimensional  uniform 
distribution,  in  which  the  agents  are  guaranteed  to  be  “close”  to 
one  another.  (For  additional  discussion  of  swarming  conditions 
under  which  this  can  be  “almost  guaranteed”  to  be  connected, 
see  e.g.  [33,34].) 

Each  agent  moves  in  our  simulation  under  the  2"d-order  uni¬ 
cycle  dynamic  model  given  above  in  Eq.  (4),  i.e.,  one  that  is  ap¬ 
propriate  for  control  of  a  two-wheeled  differential-drive  mobile 
robot.  At  each  simulation  step,  the  proximity  to  other  agents  is 
evaluated.  The  neighbor-set  for  each  agent  is  defined  as  the  near¬ 
est  A  agents.  The  simulations  are  executed  under  the  continuous¬ 
time  ODE45  method  within  MATLAB. 

For  each  value  of  N,  100  iterations  of  500  seconds  was 
run  for  values  of  Ca  =  250,  CV  =  240.  la  =  21  m,lr  =  15m, Q  = 
10, Cp  =  l, cv  =  \/2/2,  Co,  =  2.  We  compare  the  results  for  N  =  5 
and  N  =  N. 

Figure  1  shows  the  energetic  results  from  the  simulations 
described  above.  The  large  black  dots  indicate  the  minimum 
achievable  potential  energy  for  a  planar  system  of  N  agents.  The 
small  green  dots  indicate  the  initial  potential  energy  for  each  run, 
and  the  small  red  dots  (slightly  offset  to  the  right  for  each  value 
of  N)  indicate  the  final  potential  energy. 

The  minimum  achievable  energy  is  found  numerically,  treat¬ 
ing  all  agents  as  particles  with  no  motion  constraints,  but  spaced 
according  to  the  potential  function.  As  this  is  a  numerical  ap¬ 
proximation,  the  minimum  energy  may  not  be  accurate  espe¬ 
cially  for  larger  N.  This  is  confirmed  by  the  evidence  that  for  a 
small  number  of  cases,  the  dynamic  simulation  achieves  a  some¬ 
what  lower  energy  than  the  predicted  minimum. 
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(a)  Final  system  energy  without  numerosity  constraints. 


10 


x  105  Nu=5,  Ca=250,  Cr=240,  La=21,  Lr=15 


10  10‘ 

Number  of  agents,  N 

(b)  System  energy  under  numerosity  constraint  of  IV  =  5. 

Figure  1 :  Final  system  energy  with  and  without  numerosity  con¬ 
straints  for  swarms  of  various  sizes. 


These  figures  may  be  somewhat  misleading  in  a  few  cases, 
apparently  showing  that  the  final  potential  energy  in  the  system 
is  larger  than  the  initial  energy,  where  the  Ustart  values  (in  green, 
and  slightly  to  the  left  for  each  case)  are  higher  than  the  U Final 
values  (in  red,  and  slightly  to  the  right).  The  reader  is  reminded 
that  these  results  are  displayed  en  mass',  the  select  cases  where 
U Final  values  are  higher  for  given  N  than  U start  are  from  different 


simulation  instances.  For  every  individual  case,  the  final  total 
energy  ( U Final  +  kinetic  energy)  is  lower  than  the  initial  potential 
energy. 

It  is  apparent  that  the  systems  with  numerosity  constraints 
do  not  achieve  the  estimated  system  minimum  potential  en¬ 
ergy.  That  estimated  potential  is  generated  for  situations  where 
the  swarm  agents  are  each  aware  of  all  other  agents  (i.e.,  no 
numerosity  constraint).  Thus  it  is  true  that  the  numerosity- 
constrained  swarm  does  not  achieve  the  same  value  of  minimum 
energy.  To  explain  this,  we  bring  to  the  reader’s  attention  the  X 
and  Y  scales  on  the  two  figures. 

The  unconstrained  case  achieves  a  significantly  denser  struc¬ 
ture  due  to  the  fact  that  each  agent  is  aware  of  all  other  agents; 
thus,  the  presence  of  distant  agents  actually  the  swarm  spacing. 
In  Figure  2a,  the  agents  are  in  general  separated  by  3m  to  4m, 
whereas  in  Figure  2b  the  agents  are  separated  by  12 m  to  13 m.  In 
fact,  for  the  parameters  used  in  these  simulations,  the  minimum- 
energy  spacing  for  two  agents  by  themselves  (as  calculated  from 
Eq.  2)  is  approximately  15.5m.  This  reveals  that  by  applying 
numerosity  constraints,  the  typical  spacing  of  agents  can  be  reg¬ 
ulated  to  “near”  a  design  point,  dmjn,  regardless  of  the  overall  size 
of  the  swarm. 

Ultimately,  Figure  2  illustrates  that  the  overall  formation  in 
both  cases  is  more  or  less  properly  structured.  Additional  nu¬ 
merical  analysis  is  underway  to  fully  characterize  the  error  in  the 
formation  between  the  constrained  and  unconstrained  types. 

It  should  be  noted  that  in  these  figures,  the  absolute  positions 
and  absolute  orientations  are  not  critical;  this  is  the  very  nature 
of  seeking  such  semi-stable  configurations.  Only  the  relative  po¬ 
sitions  and  orientations  are  of  importance. 


4  DISCUSSION  AND  FUTURE  DIRECTIONS 

These  simulations  assume  full,  2jc  sensing  awareness.  This 
is  achievable,  but  not  always  practical.  Future  investigation  in¬ 
cludes  limited  FOV  for  the  sensors.  Additionally,  we  implicitly 
assume  an  infinite  sensing  range;  that  is,  each  agent  always  has 
information  from  its  N  nearest  neighbors,  regardless  of  their  ac¬ 
tual  separation.  At  first  glance,  this  appears  to  be  an  egregious 
assumption;  however,  in  conducting  the  simulations  without  this 
constraint  in  place,  we  can  empirically  observe  the  separations 
that  actually  do  occur,  and  can  either  tune  the  potential  function 
parameters  such  that  agents  stay  within  an  appropriate  (practical) 
sensing  (or  communication  range),  or  apply  a  requirement  to  real 
robots  such  that  their  sensing  range  is  sufficient  to  maintain  con¬ 
nectivity. 

Additional  refinement  is  necessary  to  determine  whether 
there  is  a  “sweet  spot”  of  numerosity  -  a  particular  value,  or  a 
ratio  of  the  overall  size  perhaps.  Furthermore  the  gain  values 
used  in  the  simulation  (Ca ,  C, .  l„ .  lr .  C, ,  Cp ,  cv ,  c(l))  were  selected 
in  a  trial-and-error  fashion;  further  tuning  of  these  gains  may 
yield  further  insights. 

In  reviewing  the  dynamic  results  of  the  simulations,  it  is  ob¬ 
served  that  in  some  cases,  one  or  two  agents  take  large  angu- 
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(b)  With  numerosity  constraint  N  =  5. 


Figure  2:  Final  positions  and  orientations  of  a  swarm  of  100 
agents. 


lar  velocities  -  larger  than  practically  feasibly  for  typical  ground 
robots  -  indicating  that  proper  the  tuning  of  Cp,Ce,  and  ca  is 
important. 


communication  with  multiple  agents.  Bluetooth  allows  a  client- 
server  relationship  with  up  to  6  clients.  IEEE802. 1 1  is  practically 
unlimited  given  one  or  more  access  points  either  in  the  environ¬ 
ment  or  carried  by  an  agent.  This  method  doesn’t  even  require 
communication,  though;  sensing  alone  can  provide  the  requisite 
information  for  this  controller. 

Ongoing  experiments  by  the  authors  at  the  Duke  RAMALab 
have  shown  that  visual  detection  and  position  estimation  of  up 
to  10  neighbors  with  simple  devices  (e.g.,  off-the-shelf  web¬ 
cams)  is  achievable  at  0.5-lHz  even  with  the  limiting  process¬ 
ing  power  of  600MHz  ARM5  processors  and  only  64MB  of 
system  RAM.  An  alternate  implementation  on  Dell  Mini  10  Net- 
books  (1.6GHz  Atom  processor,  1GB  RAM)  using  the  popular 
OpenCV  vision  processing  package  [36,  37]  and  the  extension 
for  color  blob-finding  [38]  implemented  in  Python  is  capable  of 
detecting  “many”  neighboring  RAMAbots  (augmented  with  ap¬ 
propriate  colors)  at  up  to  1 1Hz.  Such  detection  and  measurement 
does  exhibit  some  error,  which  is  not  yet  included  in  these  simu¬ 
lations;  further  investigation  is  continuing. 

We  do  not  consider  the  practical  consideration  of  occlu¬ 
sion.  In  most  line-of-sight  sensing  applications,  agents  cannot 
see  through  one  another;  thus  in  the  practical  sense,  the  agents 
are  constrained  to  truly  react  to  their  A  nearest  visible  neighbors. 
There  may  be  neighboring  agents  that  are  physically  closer  but 
cannot  affect  the  potential  function  because  it  is  not  sensed.  Once 
the  formation  structure  begins  to  take  shape,  though,  occlusion 
should  not  be  an  issue  if  N  is  sufficiently  small. 

The  simulation  results  presented  here  do  not  explicitly  place 
acceleration  or  speed  limits  (either  angular  or  linear)  on  the 
agents.  Continuing  work  in  both  simulation  and  experiment  ad¬ 
dresses  these  concerns;  these  initial  results  confirm  that  the  con¬ 
troller  we  propose  is  feasible  with  “ideal”  agents,  and  our  ongo¬ 
ing  effort  is  focused  on  such  practical  considerations. 


5  CONCLUSION 

We  have  presented  a  controller  that  yields  a  stable  formation 
for  agents  with  nonholonomic  constraints.  We  additionally  allow 
a  constraint  to  be  placed  on  the  agents  restricting  their  awareness 
of  the  bulk  of  the  swarm  in  order  to  allow  scalability,  and  we 
have  shown  that  this  constraint  numerically  reduces  the  ability  of 
the  swarm  to  achieve  its  minimum  possible  energy;  even  under 
this  constraint,  however,  the  swarm  is  still  able  to  significantly 
reduce  its  energy  relative  to  the  starting  energy,  and  successfully 
achieves  formation  control  in  a  predictable  way. 


4.1  Implementation  in  Practice 

We  assume  communication  or  sensing  constraints  between 
the  agents  that  are  realistic  for  current  technology.  It  is  not  un¬ 
reasonable  to  assume  that  an  individual  agent  could  maintain 
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ABSTRACT 

This  paper  presents  a  decentralized  control  algorithm  for 
the  Entrapment/Escorting  Mission  based  on  the  Morse 
potential  function.  The  entrapment/escorting  mission  tasks  a 
multi-agent  system  with  creating  a  formation  around  a  mobile 
target.  Applications  of  the  mission  include  security  awareness 
and  military  situations  where  an  autonomous  object  of  value 
must  be  guarded.  The  simulation  consists  ofN  robots  that  are 
initially  located  in  random  positions  on  aflat  terrain.  Agents 
are  provided  with  the  information  on  coordinates  of  their  peers 
by  a  supervisor,  which  are  used  to  generate  artificial  potential 
energies  between  robots  and  the  target.  The  simulation 
confirms  that  robot  positions  achieve  the  lowest  calculated 
Morse  potential  energy  in  the  arena,  and  the  resulting 
formation  is  centered  on  the  target.  Differential  drive  “e-puck'’ 
robots  are  used  to  experimentally  validate  the  control's 
application  to  nonholonomic  vehicles,  and  proximity  sensors 
are  incorporated  into  the  algorithm  to  accommodate  for 
obstacles  in  the  experimental  arena. 

INTRODUCTION 

Swarm  robotics  research  is  based  on  a  collection  of  robots, 
rather  than  a  single  agent,  accomplishing  a  specified  task. 
Multi-agent  systems  offer  better  dexterity  through  more 
controllable  components  than  a  single  agent,  and  the  effects  of 
failure  can  be  minimized  since  there  are  multiple  agents 
redundantly  working  to  complete  the  same  task.  While 
traditional  robotics  has  emphasized  dull,  dirty,  and  dangerous 
applications  [1],  swarm  robotics  offers  a  variety  of  applications 


that  range  from  services  to  life  saving  operations.  Commercial 
applications  include  autonomous  robots  that  move  goods  across 
warehouse  distribution  centers  [2],  and  ongoing  research 
challenges  a  swarm  of  robots  to  detect  mines  in  a  minefield  [3]. 

Scalability  and  hardware  limitations  are  obstacles  to 
implementing  a  multi-agent  system.  A  robust  system  must  be 
scalable  to  accommodate  for  the  addition,  withdrawal,  and 
failure  of  robots.  However,  inter-agent  communication  can  be 
complex,  especially  when  communication  is  not  centralized  or 
the  system  includes  heterogeneous  robot  configurations  [4]. 
Swarm  robots  that  are  small  and  depend  on  batteries  will  also 
have  relatively  little  processing  power  and  memory,  restricting 
their  abilities. 

A  major  distinction  in  multi-agent  systems  can  be  found 
between  centralized  control  and  decentralized  control  systems. 
Centralized  control  entails  communication  and  direction  to 
come  from  a  “centralized”  controller,  whereas  decentralized 
control  means  that  each  robot  is  independently  controlled.  As 
explained  in  Ref.  [5],  centralized  control  has  the  benefit  of 
being  able  to  monitor  the  entire  system.  Since  processing  is 
performed  off  board,  less  processing  equipment  is  required  on 
the  agent  and  the  cost  of  each  individual  agent  is  decreased. 
However,  centralized  control  is  hindered  by  high 
communication  requirements  and  limited  processing  power. 
The  number  of  agents  the  system  can  accommodate  is  finite  as 
it  is  restricted  to  the  controller’s  hardware  limitations.  Unlike 
centralized  systems  where  the  entire  system  risks  failure  should 
the  central  controller  fail,  decentralized  systems  equip  each 
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robot  with  the  sensors  and  actuators  to  independently  assert 
control  over  itself.  Consequently,  one  agent’s  failure  does  not 
cause  the  failure  of  the  whole  system.  The  overall  performance 
may  be  decreased,  but  the  task  can  still  be  achieved.  More 
robots  can  be  used  since  the  number  of  agents  is  not  limited  by 
the  capabilities  of  the  central  controller.  Trade-offs  are  made  as 
independent  processing  results  in  increased  costs  and 
complexity. 

The  emphasis  of  this  research  project  is  on  the 
entrapment/escorting  mission  [6].  In  this  mission,  multiple 
robots  surround  an  identified  target  and  are  tasked  with 
minimizing  access  or  escapement  paths  between  the  target  and 
intruders.  Entrapment  entails  preventing  the  target  from 
escaping  the  group  of  agents;  escorting  conversely  demands 
that  the  target  is  guarded  from  intruders.  The  motion  of  the 
target  is  not  known  a  priori  by  the  guarding  agents.  Solutions 
to  the  mission  have  been  primarily  offered  through  centralized 
control,  such  as  through  null-space-based  behavioral  (NSB) 
control  [6],  [7],  or  cluster  space  control  [8].  Reference  [9] 
presents  a  multi-layer  control  scheme  for  centralized  formation 
control,  while  generating  fictitious  forces  to  accommodate  for 
obstacles.  As  will  be  discussed,  experimental  robot  limitations 
in  this  research  violated  the  assumption  in  Ref.  [9]  that  agents 
had  sensors  that  could  measure  the  distance  to  obstacles. 
Therefore,  a  simplified  fictitious  force  calculation  is  developed 
in  this  paper  to  accommodate  for  the  limitations  of  the 
experimental  robots. 

Decentralized  solutions  have  focused  on  overall  system 
aggregation.  Reference  [10]  provides  decentralized  control  via 
algebraic  graph  theory  to  a  variation  of  the 
entrapment/escorting  mission.  In  the  modified  version  of  the 
entrapment/escorting  mission  referred  to  as  the  “surrounding 
control  problem,”  agents  seek  to  contain  an  additional  set  of 
stationary  robots.  The  focus  of  Ref.  [10]  is  to  design  a  control 
algorithm  based  on  a  decentralized  estimation  of  the  geometric 
center.  In  another  modified  scenario,  Ref.  [11]  challenges  a 
miniature  helicopter  to  follow  a  group  of  unmanned  ground 
vehicles  (UGV’s).  While  the  UGV’s  moved  according  to  a 
centralized  control  algorithm  presented  in  Ref.  [9],  the 
helicopter’s  onboard  artificial  vision  system  simultaneously 
calculates  the  centroid  of  the  UGV’s  to  guide  the  helicopter’s 
trajectory.  The  overall  system  is  decentralized  because  the 
UGV  and  helicopter  controllers  work  independently.  The  focus 
of  the  research  presented  in  this  paper  is  on  the  control  of  each 
agent,  rather  than  the  details  of  overall  system  aggregation  as 
found  in  Ref.  [10]  and  [11].  Centroid  calculations  and 
estimation  techniques  are  not  considered,  and  the  target’s 
motion  is  dynamic  and  unknown. 

Assumptions  made  about  the  mission  in  this  paper  differ 
from  the  assumptions  made  in  past  research.  While  Ref.  [12] 
examines  agent  formation  around  a  target,  it  focuses  on  how  an 
agent  can  identify  the  target  and  communicate  with  the  rest  of 
the  system  to  inform  their  formation.  This  paper,  unlike  Ref. 
[12],  assumes  that  all  agents  are  connected,  and  is  therefore  not 


as  concerned  about  agent  communication.  Additionally,  this 
research  does  not  make  the  assumption  that  agents  cannot  fail. 
Reference  [13]  provides  decentralized  control  using  grasp 
theory  to  constrain  a  target.  Agents  move  together  in  a  chain  in 
pursuit  of  the  target  before  enclosing  it.  However, 
nonholonomic  constraints  are  not  considered,  as  they  are  here. 

This  paper  proposes,  simulates,  and  experimentally 
validates  the  use  of  the  Morse  potential  function  to  provide 
decentralized  control  to  complete  the  entrapment/escorting 
mission.  It  should  be  noted  that  the  control  algorithm  is 
specifically  based  on  the  Morse  potential  (that  models  the 
potential  energy  of  a  diatomic  molecule),  rather  than  the  more 
general  Morse  class  of  navigation  potential  functions. 

Control  objectives  include  surrounding  the  target,  and 
minimizing  gaps  through  evenly  spacing  agents  around  the 
target.  Obstacle  avoidance  via  onboard  proximity  sensors  is 
incorporated  into  the  algorithm.  The  algorithm  is  scalable  and 
accommodates  failure.  This  methodology  builds  on  the  work  of 
Ref.  [14]  and  Ref.  [15],  which  used  quadratic  artificial  potential 
functions  to  simulate  formations  in  an  arena  with  assets,  threats, 
and  nodes  (bodyguard  agents).  Furthermore,  this  research 
experimentally  implements  decentralized  control  for 
nonholonomic  vehicles  with  obstacle  avoidance. 

The  entrapment/escorting  mission  has  a  great  deal  of  value 
for  military  and  security  applications  that  require  situational 
awareness.  Given  that  the  Department  of  Defense  has  mandated 
that  one-third  of  all  ground  combat  vehicles  be  unmanned  by 
2015  [1],  there  remain  ethical  questions  of  the  use  of 
autonomous  robots  in  a  war  zone.  The  entrapment/escorting 
mission  replaces  guarding  tasks  typically  controlled  by  humans 
with  autonomous  controllers,  and  the  question  remains  whether 
robots  or  remote  human  controllers  can  arrive  at  the  same 
decision  as  an  on-site  human  in  all  situations. 

BACKGROUND 

Multi-robot  control  frequently  draws  on  biological  systems 
to  inspire  algorithms  for  coordination  and  formation  [16,  17]. 
Potential  energy  functions  have  also  inspired  decentralized 
control  systems  by  reducing  robot  behavior  to  that  of  particles 
[18].  By  assigning  potential  energies  between  robots  in  the 
arena,  agents  naturally  move  to  points  of  equilibrium  at 
localized  energy  minima.  Scaling  potential  coefficients  can 
easily  change  the  size  of  a  swarm  and  offset  the  equilibrium 
points.  Since  potential  energy  equations  are  continuous,  control 
is  smooth  and  does  not  involve  toggling  between  on  and  off. 

This  research  uses  the  Morse  potential  [18,  19]  to  provide 
control.  The  Morse  potential  represents  the  attractive  and 
repulsive  interaction  between  two  objects,  originally  used  to 
model  the  spacing  between  nuclei  in  a  diatomic  molecule. 
Applications  of  the  Morse  potential  function  to  model  behavior 
are  quite  varied,  including  modeling  crowd  dynamics  in 
emergency  situations  [20].  The  expression  for  the  Morse 

potential  is  shown  in  Eq.  1.  In  the  planar  case,  xt  =  [  1, 
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in  3-D  case. 


In  this  work,  the  planar  case  is 


t/Oi)  =  (cre-\\x~xi\\/lr  -  Cae-llXi~xlll/ia)  (1) 

The  scalar  parameters  la  and  Ir  represent  the  attractive  and 
repulsive  characteristic  lengths,  and  Ca  and  Cr  are  the 
respective  coefficients  that  scale  the  potential.  The 
characteristics  lengths  determine  the  spacing  between  agents 
upon  reaching  equilibrium.  The  position  of  agent  i  is 
represented  by  x„  whereas  x,  represents  the  position(s)  of 
neighbor(s)  of  agent  i. 

As  shown  in  Eq.  2,  the  negative  gradient  of  the  Morse 
potential  energy  is  the  force  exerted  on  each  agent. 

Hxt)  =  —  VU(Xt)  (2) 

When  two  agents  are  too  close  to  each  other,  they 
experience  a  repulsive  force  as  dictated  by  the  gradient  that 
pulls  them  away  from  each  other.  When  the  agents  are  far 
apart,  the  force  causes  them  to  accelerate  towards  each  other. 
This  attractive  force  causes  agents  that  are  dispersed  throughout 
the  arena  to  sense  an  attractive  force  to  each  other,  resulting  in  a 
gathering  movement.  The  repulsive  force  prevents  robots  from 
colliding  into  each  other  or  obstacles.  Agents  cease  their 
acceleration  and  settle  into  the  formation  when  the  sum  of  the 
repulsive  and  attractive  energies  comes  to  a  minimum,  resulting 
in  a  gradient  (and  net  force)  of  zero. 

The  Morse  potential  function  is  only  dependent  on  an 
agent’s  location  and  the  location  of  its  peers.  By  providing  an 
agent  with  this  positioning  information,  each  robot  can 
individually  calculate  the  sum  of  the  artificial  forces  it 
experiences.  This  net  force  then  dictates  robot  acceleration  and 
movement.  Assuming  each  agent  can  obtain  the  relevant 
position  information  for  the  system,  agent  formation  and 
decentralized  control  can  be  driven  by  the  Morse  potential 
function.  No  agent-to-agent  communication  is  required,  and 
the  only  communication  necessary  is  the  positional 
information.  Alternatively,  this  positional  information  can  be 
acquired  through  sensing,  eliminating  communication 
requirements  altogether.  Furthermore,  the  system  is  scalable. 
Agents  added  to  the  system  (or  removed  from  it)  are  summed 
with  (or  subtracted  from)  the  system  energy. 

PROBLEM  FORMULATION 

This  research  seeks  to  experimentally  validate 
decentralized  control  for  the  entrapment/escorting  mission 
using  the  Morse  potential.  As  dictated  by  the  mission,  agents 
must  surround  a  target  without  colliding  into  each  other  or  the 
target  itself.  Additionally,  the  agents  must  evenly  space 
themselves  around  the  target.  The  terrain  is  assumed  to  be  a 
flat  arena  with  no  holes. 


The  control  algorithm  also  accommodates  for  obstacles 
identified  via  onboard  proximity  sensors.  Glossy  white  boxes 
are  randomly  placed  in  the  arena,  challenging  agents  to 
navigate  around  the  obstacles.  In  the  experiments,  obstacle 
positions  are  neither  recorded  nor  shared  between  agents. 

An  agent  with  randomized  motion  was  designated  as  the 
target  for  the  entrapment/escorting  mission  experiment.  Past 
research  has  used  a  tennis  ball  as  a  target  [6]  or  a  target  that 
emits  signals  received  by  agents  within  a  sensing  radius  [18]. 
Reference  [14]  identifies  various  methodologies  for 
determining  a  target’s  motion,  such  as  “randomized  waypoints” 
that  creates  a  random  set  of  destinations  for  the  robot,  or 
“random  direction”  that  randomly  assigns  a  heading  for  the 
robot  every  time  it  reaches  the  arena’s  boundary.  In  this 
experiment,  the  target’s  controller  randomly  generates 
coordinates  for  waypoints.  Upon  arrival  at  a  waypoint,  the 
target  generates  its  next  waypoint,  drawing  from  a  uniform 
distribution  over  the  traversable  area.  It  is  assumed  that  the 
target  would  not  attempt  to  navigate  to  an  unreachable 
destination.  Therefore,  the  situation  when  the  randomly 
generated  waypoint  is  coincident  with  an  obstacle  is  not 
considered. 

The  primary  assumption  in  this  experiment  is  that  agents 
know  the  positioning  information  of  all  robots  in  the  system 
(including  the  target).  In  practical  applications,  this 
information  could  be  relayed  by  a  GPS  or  satellite  system.  Yet 
it  should  be  noted  that  there  is  not  a  central  controller 
governing  the  agents’  movement.  All  agents  in  the  simulation 
and  experiment  are  homogeneous,  nonholonomic  vehicles. 

In  the  experiments,  there  was  a  single  target  agent  in  the 
arena  accompanied  by  four  additional  agents  responsible  for 
surrounding  the  target.  The  control  method  was  simulated  and 
experimentally  validated  with  e-puck  robots.  The  e-puck 
robots  were  created  by  the  Ecole  Polytechnique  Federale  de 
Lausanne  (EPFL)  [21],  and  were  selected  for  their  simple 
communication  capabilities  via  Bluetooth  and  low  cost.  The  e- 
puck  uses  differential-drive  with  a  stepper  motor  to  control 
each  wheel,  and  thus  has  nonholonomic  constraints. 
Communication  between  e-pucks  is  not  possible  without 
extensive  customization,  reinforcing  the  need  for  a 
decentralized  solution  to  the  entrapment/escorting  mission.  The 
eight  proximity  sensors  distributed  around  the  perimeter  of 
each  e-puck  are  not  accurate  enough  to  provide  reliable 
distance  readings  across  a  large  range  of  distances  [5],  limiting 
its  functionality.  However,  the  proximity  sensors  can  indicate 
whether  an  obstacle  is  immediately  present  in  front  of  a  sensor. 
As  a  consequence  of  this  behavior,  all  collision  avoidance  in  the 
model  utilized  the  proximity  sensors  as  a  detection  switch 
rather  than  a  detection  measurement. 

ANALYTICAL  MODELING 

The  decentralized  controller  modeled  in  this  paper  features 
primary  movement  determined  by  the  Morse  potential  function. 
The  gradient  of  the  Morse  potential  in  Eq.  1  leads  to  the 
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modeled  force  and  acceleration  that  are  exerted  on  the  agent. 
The  attractive  and  repulsive  forces  between  the  target  and  each 
agent’s  peers  determine  the  motion  of  the  robot.  Since  a 
formation  is  desired  around  the  target,  an  additional  attractive 
force  towards  the  target  is  added  to  each  agent  [19].  This 
results  in  a  total  potential  energy  as  seen  in  Eq.  3. 

U(xd  =  \cc{xt  -y)2  + 

Y!j=1  [Cre~ \\x~xiVlr  -  Cae-Wx-xiVla)  (3) 

The  magnitude  of  the  attractive  potential  energy  to  the 
target  is  tunable  with  scalar  Cc,  x,  is  the  position  of  the  target, 
and  Cr  and  Ca  scale  the  respective  repulsive  and  attractive 
forces  between  peer  agents. 

As  was  shown  in  Eq.  2,  the  gradient  of  the  Morse  potential 
can  be  taken  to  find  the  modeled  force  that  the  agent 
experiences  in  the  x  and  y  directions.  This  results  in  Eq.  4, 
where  dx  =  (xj  —  xt),  and  dr  =  y/(dx)2  +  (dy)2. 


dv: 

m  ~dt  = 


=  Cc{xL  -  xt )  + 


if: 


Cndx 


Crdx 


j*i  \  (la)(dr)e(ia)  (7r)(dr)e(<r) 


+  (cc{xi-xt)+^i  CadX(dr, - ^W)/  (4) 

V  (Za)(dr)e^aJ  (ir)(dr)evfr’J/ 

This  equation  verifies  that  the  agent  will  find  equilibrium 
when  the  potential  energy  U  is  at  a  minimum.  Since  the 
minimum  of  potential  energy  is  found  when  its  gradient  is  equal 
to  zero,  the  agent  would  not  accelerate  when  the  potential 
energy  is  at  a  local  minimum. 


psk  = 


0  if  kth  proximity  sensor  <  threshold 
1  if  kth  proximity  sensor  >  threshold 


(5) 


If  any  of  the  proximity  sensors  are  above  the  threshold 
value,  the  controller  generates  an  obstacle  repulsive  force 
utilizing  a  matrix  that  transforms  the  angle  of  each  proximity 
sensor  to  the  global  reference  frame.  This  is  seen  in  Eq.  6  and 
Eq.  7.  Since  the  proposed  method  is  designed  for  an  e-puck 
robot,  the  angles  of  the  proximity  sensors  locally  on  the  e-puck 
are  included  in  the  equations.  Cobs  is  a  coefficient  that  scales 
the  magnitude  of  the  repulsive  force. 


robs.x.i 
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'  cos(0j  —  10°  +  offset ) 
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cos  (0j  —  210°  —  offset ) 
cos(0j  —  270°  —  offset ) 
cos(0j  —  315°  —  offset ) 
_cos(0j  —  350°  —  offset ) 
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'  sin(0j  —  10°  +  offset ) 
sin(0j  —  45°  +  offset ) 
sin(0j  —  90°  +  offset ) 
sin(0j  —  150°  +  offset ) 
sin(0j  —  210°  —  offset ) 
sin(0j  —  270°  —  offset ) 
sin(0j  —  315°  —  offset ) 
_sin(0j  —  350°  —  offset ) 


(7) 


Equation  8  subsequently  determines  the  desired  velocity 
and  position  of  the  i-th  agent  based  on  the  generated  forces, 
where  Eq.  9  defines  the  velocity  of  that  agent  from  the  previous 
time  step. 


While  this  potential  function  does  not  consider  obstacles 
impeding  agent  motion,  onboard  proximity  sensors  provide 
additional  guidance.  Since  the  quality  of  the  e-puck  proximity 
sensors  was  found  to  not  reliably  indicate  the  distance  of 
perceived  obstacles,  the  proximity  sensors  acted  as  a  switch  to 
indicate  whether  an  obstacle  was  immediately  present.  A 
threshold  value  was  determined,  and  when  a  sensor  value 
surpassed  the  threshold,  a  repulsive  force  was  applied.  When 
the  agent  moved  away  from  the  obstacle  and  proximity  sensor 
values  dropped  below  the  threshold,  the  agent  returned  to  the 
potential  function-based  controller. 

The  obstacle  repulsive  force  generated  was  in  the 
opposite  direction  of  the  obstacle.  Elowever,  a  slight  offset  was 
subtracted  from  the  vertical  away  from  the  obstacle.  This  offset 
force  allowed  the  agent  to  move  away  from  the  obstacle  while 
simultaneously  advancing  around  the  obstacles  perimeter.  In 
Eq.  5,  the  e-puck  generates  a  matrix  indicating  which  proximity 
sensor  values  surpass  the  threshold. 


v  —  Vi  +  Av;  — 

m 
A  Xj 

Vi  —  — ~ 

1  At 


(8) 

(9) 


The  modeled  mass,  in,  is  used  as  a  gain  for  control 
purposes,  and  At  is  the  discrete  time  to  complete  a  loop. 

Since  the  agents  used  in  this  research  are  nonholonomic, 
the  agent’s  desired  velocity  must  be  converted  into  individual 
speeds  for  each  wheel.  The  velocity  vector  is  first  converted 
into  polar  form  in  Eq.  10  and  Eq.  11. 


^ norm.i  4"  ^y,i  0^) 

6desired,i  =  tan'  ‘  ( Vy,j  /  Vx,i)  (11) 

The  desired  heading  ( ddesmdj)  is  the  inverse  tangent  of  the 
two  components  of  the  desired  velocity.  The  heading  error  is 
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the  difference  between  the  desired  heading  and  the  agent’s 
current  heading  ( #). 


terror,  i  0desired,i  -0i  (12) 

Angular  velocity  can  be  computed  using  Eq.  13,  where  k  is 
a  gain. 

-Q,  =  k  *  (0  error,!  /At)  (13) 

In  the  model,  the  gain  was  scaled  so  it  would  be 
proportional  to  the  magnitude  of  the  artificial  force  and  the 
heading  error. 

Upon  calculating  the  magnitude  of  the  velocity  and  the 
angular  velocity,  the  values  are  multiplied  by  a  transformation 
matrix  that  provides  the  linear  velocity  for  each  wheel  in  Eq. 
14. 


( fright ) 


ill  — fc/21  rVnorml 
r[i  b/2  L  \ 


(14) 


The  product  of  the  angular  velocity  of  each  wheel,  co,  is 
determined  by  the  radius  of  the  wheel,  r,  the  distance  between 
the  wheels,  b ,  and  the  solutions  to  Eq.  10  and  Eq.  13. 


SIMULATION 

The  described  model  was  simulated  using  a  commercially 
available  package  Webots,  marketed  by  Cyberbotics  Ltd.  Three 
simulations  were  performed  with  five  e-pucks  in  the  arena.  One 
of  the  e-pucks  was  designated  as  the  target.  The  first  instance 
simulated  an  arena  devoid  of  obstacles,  and  the  second 
simulation  featured  two  obstacles  placed  in  the  middle  of  the 
arena  as  shown  in  Fig.  1.  The  third  simulation  mimicked  the 
first  simulation  with  no  obstacles;  however,  one  agent  was 
programmed  to  temporarily  “fail”  for  20  seconds  to 
demonstrate  the  robustness  of  the  system.  The  agent  ceased 
movement  as  the  other  agents  continued  to  move  without  being 
disturbed  by  the  failed  robot. 


FIGURE  1.  SIMULATED  ARENA  WITH  OBSTACLES 


In  each  simulation,  agents  settled  on  a  formation  as  shown 
in  Fig  2.  Upon  creation  of  the  formation,  agents  remained 
equidistantly  spaced  around  the  target,  as  shown  in  Fig.  3.  As 
noted  by  the  arrows  in  the  figure,  there  was  a  momentary  shift 
in  formation  when  the  target  switched  direction,  causing  each 
agent  to  change  their  orientation  accordingly.  Equidistant 
spacing  was  simultaneously  achieved  when  the  agents  reached 
their  minimum  Morse  potential  energy,  as  noted  in  Fig.  4.  The 
agents  asymptotically  approached  energy  levels  of  0.3  x  10"6  J, 
and  maintained  a  distance  of  approximately  20cm  between  the 
centroid  of  each  agent.  Figure  5  depicts  the  potential  energy  of 
an  agent  at  points  across  the  arena.  The  actual  position  of  the 
agent  marked  in  white  shows  that  it  is  located  at  its  lowest 
potential  energy  in  the  arena  while  in  the  formation. 


FIGURE  2.  SIMULATED  AGENT  EQUILIBRIUM  FORMATION 


Distance  to  Target  vs.  Time 


FIGURE  3.  SIMULATED  DISTANCE  BETWEEN  AGENTS  AND 
TARGET  VS.  TIME 
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x  io’6  Potential  Energy  vs.  Time 


FIGURE  4.  SIMULATED  AGENT  MORSE  POTENTIAL 
ENERGY 


x  10^  Potential  Energy  of  Agent  Across  Entire  Arena 


x  (cm) 


FIGURE  5.  POTENTIAL  ENERGY  OF  AGENT  ACROSS  THE 
ENTIRE  ARENA 

In  the  second  simulation,  agents  identified  obstacles  and 
navigated  around  them  due  to  the  obstacle  repulsive  force. 
Agents  periodically  got  stuck  in  local  minima  and  would  not 
advance.  Since  the  system  was  dynamic,  agents  moved  out  of 
the  local  minima  when  the  rest  of  the  system  changed  positions 
and  exerted  different  forces  on  the  stuck  agent. 

The  third  simulation  demonstrated  the  system’s  ability  to 
accommodate  for  failure.  When  an  agent  stopped  moving,  the 
remaining  agents  continued  along  their  normal  trajectories. 
However,  agents  redistributed  themselves  to  create  a  triangular 
formation  around  the  target  as  shown  in  Fig.  6,  rather  than  the 
previous  square  formation.  The  triangular  formation  minimized 
gaps  between  the  robots  in  order  to  compensate  for  the  missing 
agent. 


FIGURE  6.  SIMULATED  FORMATION  WITH  ONE  AGENT 
FAILURE 

EXPERIMENTAL  SETUP 

The  experiment  was  conducted  in  the  Robotics  and 
Manufacturing  Automation  (RAMA)  Laboratory  at  Duke 
University.  As  seen  in  Fig.  7,  the  arena  consisted  of  a  flat 
testbed  measuring  120cm  by  240cm.  Reflective  markers  were 
uniquely  fastened  to  each  of  the  five  e-puck  robots  in  the  arena. 
Three  OptiTrack  V100:R2  cameras  (not  visible  in  Fig.  7)  were 
mounted  above  the  arena  and  identified  the  markers.  A 
computer  using  the  NaturalPoint  Tracking  Tools  software  then 
processed  the  camera  images.  Through  the  program,  the  spatial 
locations  of  the  e-pucks  were  identified  and  streamed  over  the 
network.  Thus,  the  OptiTrack  system  served  as  a  surrogate 
GPS  system  that  could  provide  positioning  information  to  each 
of  the  e-pucks.  By  using  the  OptiTrack  system  in  the 
experiment,  the  assumption  was  made  that  all  agents  had 
previously  been  identified  and  that  the  system  was  calibrated 
correctly. 


FIGURE  7.  DUKE  RAMA  LAB  ARENA 


Since  the  e-puck  robots  have  little  processing  power,  all 
calculations  were  performed  by  an  offboard  computer.  Five 
computers  running  MATLAB  executed  the  computations 
during  the  experiment;  each  computer  controlling  one  e-puck. 
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The  computers  communicated  with  their  designated  e-puck 
through  a  bluetooth  connection.  While  the  calculations  were 
not  performed  on  board  the  e-puck,  it  is  important  to  note  that 
this  is  not  a  centralized  control  system.  All  computations  were 
performed  individually  and  paths  were  calculated 
independently. 

Proximity  sensor  information  was  transmitted  from  each  e- 
puck  to  its  respective  computer  during  each  control  loop. 
Calibration  tests  were  regularly  performed  before  each 
experiment  to  increase  proximity  sensor  consistency.  After  the 
proximity  sensor  and  Morse  potential  calculations  were  made, 
individual  wheel  speeds  were  communicated  to  the  agent,  who 
in  turn  actuated  them. 

At  the  beginning  of  the  experiment,  the  five  e-pucks  were 
randomly  placed  across  the  arena.  The  e-pucks  started  with  no 
initial  velocity.  After  establishing  connection  to  their  respective 
computers,  the  controllers  were  simultaneously  triggered  by  a 
clap.  Each  closed-loop  controller  ran  until  terminated  by  a 
user-defined  timer. 

One  e-puck  was  designated  as  the  target  during  the 
experiment.  To  model  random  motion,  its  controller  consisted 
of  a  MATLAB  program  that  randomly  generated  a  set  of 
coordinates  in  the  arena.  The  target  then  moved  towards  those 
coordinates  and  was  assigned  a  new  destination  upon  arrival. 
The  remaining  agents  used  a  decentralized  controller  based  on 
the  model. 

EXPERIMENTAL  RESULTS 

Plots  for  the  entrapment/escorting  mission  experiments  in 
an  obstacle-free  arena  are  shown  in  Fig.  8  and  Fig.  9.  They 
correspond  to  the  plots  generated  during  the  simulation.  The 
experiment  validates  the  simulation’s  calculation  that  agents 
will  maintain  a  distance  approximately  20  cm  apart  from  each 
other,  and  the  Morse  potential  energy  will  approach  a  value  of 
0.3  x  10"6  J.  The  equilibrium  formation  of  the  agents  in  an 
obstacle  free  arena  is  shown  in  an  overhead  photograph  in  Fig. 
10. 


FIGURE  8.  EXPERIMENTAL  DISTANCE  BETWEEN  AGENTS 
AND  TARGET  VS.  TIME 


FIGURE  9.  EXPERIMENTAL  AGENT  MORSE  POTENTIAL 
ENERGY 


FIGURE  10.  EQUILIBRIUM  FORMATION 


To  prevent  controllers  from  setting  dangerously  high  wheel 
speeds,  a  velocity  cap  was  placed  on  each  wheel.  This  limit  can 
be  visibly  seen  in  Fig.  8,  when  agents  begin  their  movement  to 
the  target  at  a  constant  slope.  As  indicated  by  the  arrow  in  Fig. 
8,  agents  temporarily  broke  their  formation  as  the  target  moved 
close  to  the  boundary,  forcing  agents  to  move  away  from  the 
arena’s  walls  to  avoid  collisions. 

Additional  testing  found  that  agents  were  able  to 
successfully  redistribute  themselves  into  a  triangularly  shaped 
formation  when  an  agent  intentionally  failed.  The  method 
proposed  for  the  e-puck  sensors  was  successful.  Flowever,  it 
should  be  noted  that  all  obstacles  in  the  arena  were  glossy  and 
white.  Since  the  surfaces  were  able  to  strongly  reflect  the 
signals  from  the  proximity  sensors,  their  performance  was 
expected  to  be  better  than  a  dark  obstacle  that  would  have 
absorbed  the  infrared  light  from  the  proximity  sensors. 

CONCLUSION 

This  paper  models,  simulates,  and  experimentally  validates 
that  the  Morse  potential  function  can  be  successfully  used  to 
exert  decentralized  control  during  the  entrapment/escorting 
mission.  Agents  were  able  to  follow  a  moving  target  and 
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evenly  distribute  themselves,  even  when  a  robot  failed. 
Experimental  testing  successfully  demonstrated  the  application 
of  the  model  to  nonholonomic  vehicles.  The  inclusion  of 
proximity  sensors  in  the  algorithm  as  a  detection  switch,  rather 
than  providing  a  detection  measurement,  enabled  agents  to 
navigate  around  obstacles  in  the  arena.  Future  research 
applying  the  Morse  potential  function  to  the 
entrapment/escorting  mission  might  investigate  its  application 
in  3 -dimensional  environments.  Additionally,  the  experiment 
performed  in  this  paper  could  be  reevaluated  using  a  robot  with 
higher  quality  sensing. 
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